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ON THE PROBABLE ERRORS OF FREQUENCY 
CONSTANTS. 


PART IL. 


EDITORIAL. 


(1) THE probable errors of frequency constants for a single variable have been 
discussed in an elementary manner in Vol. 11. pp. 273-281 of this Journal. The 
following notes on the probable errors of frequency constants of distributions with 
two variables have been provided at the request of certain of our readers*, 


All the constants of a frequency distribution of two variates are expressible in 
terms of the higher product moments defined by 


Pag = S {gy ts yy! } IN 


y/ 
where the origin is at fiwed values of # and y. Transferred to the mean %, 7, we 
have: 
. ’ ‘ inte 
Pag = 8 {Mex (@s — &)! (ys — Y)*}/N 
where % and ¥ will vary from sample to sample. Here n,, is the frequency of 
individuals with characters 2, and yy, and N is the total population. 


Throughout rag will denote the correlation between two quantities a and 8; 
o., s Will denote the standard deviations of a and 8. ; 


The following results are well known: 


Nese Net’ 
re =- 
ony Nyy” Nga’ Ny 


* Reproduced from Lecture Notes. 
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Further if n,=S(n,y) for all values of s’, 





NsNy 
On, FnyTngny = Mae — ve Seceideiepnenaluens oon 
Ng Ntgy F 
dL ea | ae MBps oe (vi), 
Ns - 
Tn Figg” Mg Ngy = Mes! (1 - *) sorsbonsaedsa ove scalepacee 


See Biometrika, Vol. Vv. pp. 191—2. 


(2) We have, if § denote a variation in any frequency constant due to random 
sampling, 
NG po, ¢ = S (Stray LetYeT) ccececccecceeee seeanted (vii) bis, 
and if m be number of random samples: 


1 
oe} a = , 
N°*o — (Spo, ¢’) 


=8 (0°, 262 Ys™" ) + 28 (Fn, Fry Tiny! Ny x asatyy? ye") 


PY ee re’ ce ieeiegs ae 
= Sirgy (1 ae ) xt yy! — 28 ( “V © xl aly y? ye") 
J 


= N (paq,2y — Pag Poe): 


— P2q,27 — P'a,9 


Thus or.¢™ V pe vkdeceicneiryansssis os }aNeneennane 
Again Nbpu,w =S (Ny %s" Ye"), 


[2 . inn . »GtUm ye y 2 » Qty, gi tw' 
N op, q "2. u’ "Pa dPuw s (on,, ony . Nag! Nyy! = ay" “Ys "Yt ) + s ( Nyy U8 Ys , ) 
SS: 
= NV (Po+ug+w — Pag Puw)s 


o, Oy? Te a. eer 
Pad” Puw PadPuw N 


thus 
(viii) and (ix) refer only to the higher moment coefficients about a fixed origin. 

About the mean we have 

N Pig, = {ns (4s — ©) (ye — 9)"}, 
N Spo, = 8 [Bree (& — B)* (ye — Y)"} — QBEPqa,¢ — Y EYP aga +++++-(X). 

Now it is clear that before going further we want to know the correlation 

between variations in n, and % or 7. Now 
Né&=S (N42), 
N8z% = S (Sn; x), 
N88 ny = S (SNe S24 2;), 


{ n ‘Nt Ney 
2 ices 8 . Y tee’ 
Noxon, En, = Nsw ( = “J as—8 ( WV x) 


= Ny (w,;—%), using (vi) and (vii), 


; ig ; 
or 750m En, ods TD: i ccuds stevegunvens garevaaeas ove ey 






















Editorial 
Next we want OzOgl ays but this is known, or easily found to be j,,/N. 
Returning to (x), squaring, summing for every possible sample and dividing by 

the number of samples, we have 

N?0* 5, = Poe — Pag + PP 20P oa, + Y°Po,2P'a,¢— + 299 Bi,1Poa,¢ X Pag 


— 29Pq+1,¢ Poa, — 24 Pa,q+1Pa,ga ---(X1i). 
We may now find the correlation between any pair of higher momeat 
coefficients, 


N8piu,w = 8 {8Ngy (43 — B)" (yy — 9)" } — UBC Piya, w — W SY Pu,w-r ---(xiii). 
Multiplying (x) and (xiii) together, summing for all possible samples and 
dividing by the number of samples and N, we have 


No; , oD, uw! Baa Pu uw’ a Po+u,q+w Big? Pag Puw + QU P2,0Pqa,¢ Pua,w + QW'Do,2Pq,q—Pu,wa 
+ QwDs,:Pq-a,y Pu,w-a + YUP, 1 Po,q Bua, w 
— Upe+i,¢ Pua,w — Udo, ¢+1Pu, w—1 
— QButs,wBe-1,¢ — 1 Buu sr1Bg gar -+-esesceevecessseees (xiv). 
(xii) and (xiv) contain all the requisite data for the probable errors of random 
sampling in the case of two variables. They, of course, contain implicitly the case 
of one variable, as we have only to put q’ and w’ zero in order to fall back upon the 
formulae (vii) and (viii) of the first part of this paper (Biometrika, Vol. 1. 
pp. 276—7). 
(3) We may illustrate as follows: 
(a) Correlation of errors in means: 
Tay = "ey eee eee eee eee eee eee ee ee ee ee eee ees (XV »: 


(8) Correlation of errors in standard deviations* : 
" _ —. =i. —nd 
NV Fh "Bag" Pap Dy, P22 ~ Po Proes 


= Px 24 Poo Pre 


Ox 7 Oy) 6,0, N ’ 

* 7 pe” PnP RAPE Ch namcient ciel (xvi). 
: Psa — P20 V Pos — Pe 
This is the general value of the correlation between the standard deviations of two 
correlated variables. We may write it in the form 


= _Bal(BaPu)—1 
tv N(B,—1)(B, — 1) 
where 8, and 8,’ are the second f’s for the two variables respectively 
We may now investigate P.o, 


bic . ~~ 
Bas = 17 S ((@s— 2) (Yr — 9) ew} 


* We may safely write poo for po 2 etc. when we leave the general formulae, but pgq_) for pg, q’-1 is 
capable of misinterpretation. 
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Let us sum for s’ and keep a, constant ; then 
D> {(ys oy yy Nes} ae {ns (o*y, + (Ys ps yy}, 
where o, is the standard deviation and ¥, the mean of the array of y’s for a given 
a;. If the regression be linear and homoscedastic, then 


oe oy a 
Ys — Y = Vay a (as —), 
2 


and oy = oy (1 — 1 zy). 


: 2 ¥ _. 7 
Hence pu = a. # \n. (a, — %)? (1 — rx) + Ms (@s — @) rwy\ 
oe 


= 6,°0,7 {1 — 1 gy +1 xy Bo}, 


—1= 1 yy (8B, — 1). 


Po Por 


Similarly Pa —1=7*,, (8. —1), 
PP : 


on the assumption that the other regression is also linear and homoscedastic. 

It is accordingly impossible for two variables to have a regression linear and 
homoscedastic in both senses unless 8, has the same value for both variables. 
Clearly for most practical purposes we may take 


Pe — 1 = 1% V(B,— 1) (B: — 1). 


Thus approximately Vo oy Tiny veeeeeeceeeneeeeeeeeeeeeneneeeess (xvil). 


This is identical with the result found by Pearson and Filon (Phil. Trans. 
Vol. 191, A, p. 242) on the assumption of normal correlation. It is now seen to 
be true, whenever we may assert linearity of regression and homoscedastic dis- 
tribution for both variables. 


(y) Probable error of a coefticient of correlation. 


Put Vey = em 
VPoPor 
Srry Spr 1 dpm 1 SP 
Ty Pu 2 Po 2 Pu 
Square, add for all random samples and divide by their number: 
Cry — 1 (Pa—Pu? 1 Po- Pu? , 1 Pu—Pu? _, 1 Pa— PoP 
mF yee ter ee oe ne te = 
ee = Babe (xviii). 
PuP2 PuPee 
This is the most general value of the standard deviation o,, of a correlation 
coefficient. It was first given by Sheppard (Phil. Trans. Vol. 192, A, p. 128, with 
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an obvious printer’s slip, the omission of 7%,,). It is clear that we require a know- 
ledge of at least an approximate value of p, and p,; as well as p» in order to 
simplify this expression, which is far too cumbersome for practical use. 
Now Nx Pa = S {ry (#s — xy (Ye a y)} 
omnes rT) a) (x »\3) 
= S {ns (Ys — Y) (as — ZF}, 
if we sum for the sth array of y’s. 
But, if the regression be linear, 


Tay CF a 
fa at (23 — zB), 


ag 


a 


; r Vay y o 5 7)4) 
therefore N X jin = - = S {n, (a, — 2), 
x 


a1 = Tay Oyo» 


nm) 


0 I a i < ysetnn sn} we occa uesepeceeeeis (xix). 
Similarly 


° , 
13 = Teyox oy'B» 


a] 


We can now substitute in (xviii), if we determine what value to give to p». If 
we take 
Px = Ox O,? (1 — xy + Try X $(B, + B.’)}, 
Px B.—-1+ 8-1 
I 


-1l=?r,, x 


Po Do2 2 


Doo 1 / 
and a = - ] +4(8. + 8’), 


we have 





Pn _1_1-", 4 _ta:e%=1) 
pr rey : (s B: 


22 22 
: " — Vay 


Yay 


or 
Hence 


+ }(B2— 148s —1)(1 + 40%) 


+ (8-1) +4(8!-1)-(8.-1)- 8-9} 


_ py fl — Pry _ 


1-7, , 2 
= Vv “{1-—3(8.—1+ By —1) Try} 


1 — yy ; " x 
2p) ae —¥ {1 — 1 xy — $(B2— 3 + Bo’ — 3) 1 xy}- 


é Y 1- Yay ( 1 ‘ ye mq \* : 
Or eo, = TN 'g —}(8.-—3+48,'-3) a ee (xxi). 


Cg 
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This result is of much interest. If the kurtosis be zero, then 





and we have Cc, = —— (xxii) 


the value originally given by Pearson and Filon for the standard deviation of 
a correlation coefficient when the frequency surface is Gaussian (Phil. Trans. 
Vol. 191, A, p. 242). We see accordingly that: 


(i) Equal kurtosis is needful in the two variates if the regression is to be 
linear and the arrays to be homoscedastic in the case of each variable. 


(ii) The ordinary value subject to (i) is only correct provided the kurtosis 
is zero, and this is true whether the distribution be Gaussian or not. 


(iii) The ordinary formula may give very inaccurate results, if the kurtosis be 
considerable and the correlation high. 


(iv) It is probable that (xxi), as we have taken a mean value for py, gives 
fairly good results even when the correlation is not linear. 


Clearly we must always have 
2 


Ty Sl eeececeeseccecc sce rcnee (xxiii). 
VB; salt By —1 

Or, for linear regression in homoscedastic systems there is a superior limit to the 

correlation possible with given values of the kurtosis. This is an interesting point, 

and forms a remarkable limitation on the nature of double linear homoscedastic 

regression. 


(4) We may now find the correlation between a product-moment py y about 
fixed axes and py, a product-moment about axes through the centroid. We have 
to multiply together (vii) bis and (xiii). We have 

N°c, os .X, 5 .=S iy y — et! ae x (a, — B)" (yy — 9)" 
Pad? Puyw ~ "Pad Pusw 88 Vv)” Ve — LY Ye —Y 


Ney Nes’ wld es ee 
a ee U1 ey x (a; — z)M (yy _ y)" 
o Nes’ i F 
— Upua,w S Vy (%—#) sya 


—UDuwaS es (ys —-Y) aye siesaeets (xxiv). 


We can simplify the form of (xxiv) by taking the fixed axes now through the 
centroid itself. This gives us 


1 
y = —!5 , '—@D /D /— UD. ‘D —_ D5, , Dr o! 
Spe,¢ 7 Buu Pad Puw jas N (Pa+u,q +e Pag Pu,u Upu-r,u Pon, U Pu,u = Pa,4 +} 


oo 


gee 


Editorial 7 


Illustrations. (a) To find the correlation between a deviation due to random 
sampling in a mean and one in the standard deviation of the same variate. 


Take g=1,q7=0, *. p=; u=2,wW=0, .. Buew=pr. 


oF F 1." Fy = N (Peo a PrP» ii 2D Pr»): 


But Po=9, oF= o,,/VN, 
sd V by — f:"/ VN = Be VB, = 1/VN, Pw» = Bs = VB, ox. 
Hence "s, =1z,,.= 5) aah Geena rit. (xxvi). 


This is perfectly general; we see that variations in the mean are independent of 
variations in the variability for all symmetrical systems including the Gaussian. 


(8) To find the correlation between a deviation due to random sampling in 
the mean of one variate and one in the standard deviation of a correlated variate. 


Take gq=1,q=0, .. pag=%; u=0, wW=2, .. Puw = Pu= be = oy 
Ba tar a a his 
CFC uy Spy = WV (Piz — Pro Por — 2Pn Pu). 
But Po=Pu=0, oz= o,/VN, Cn, ™ pe VB, —1/VN. 
It remains to consider 
Pu = 78 {nse (@s — ©) (yw — yy} 


: | 
f 7 7 2) 
= 78 {ite (e — #) (ys — WY"), 
where %~= mean of array of a’s corresponding to the yy of y. Hence if the 
regression be linear 
Ll t50; 
2 Tay Fe g 


eae a 
Pu= WV * (ns (yw — Y)} 

an =— fs = Vay Fx Oy? V By. 
Thus we have "Sey = Spy = Vay WIC PI, <= 8 on ckenasinsbisestae (xxvii). 
Similarly ag, = ay VBi|V Bo .eesseeeceeesseeseeees (xxviii). 

‘ 

Clearly Tze, = "ayy = 735°" yo, 
and, Tio, = "ay-Tz0,1 = Vsy-T30,) 


by (xv), which show us that these correlations are second order correlations, and 
proves that the correlation of the mean of one variate with the variability of 
a second is zero, for constant mean value of the second, since 


"Zoy "39" Foy 


T- = ° 
we wey 2 2 
: V1 —9g V1 —r'y, 


y 
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e 


(y) To find the correlation between a mean and a coefficient of correlation, 
i.e. between % and rz,. 


We have to multiply 6% = 8p, with 








Bra _8Pu_1 Po _1 BP 
Vay Pu 2 P» 2 Poe 
and apply (xxv) to each term. We have 


lig or, cat {ou _1 Bm _1 Bu) 
Vey Vey © Vyyt N Pu 2 Dav 2 Dn) 
2/8, ent ; Pe 
(xy Fy Fa VB, VVBio2 1 Pyoeo7N By) 
ae Kay et % > 2 a ee Ma 
N\ Tey Cy Ox 4 Cz 2 o,; j 


1 ; ; 
“7 {$(VBi — rey VBy)}; 


using the values of j., and fp, for linear regression. If we now use the value 
in (xxi) for o, we have 
xy 


bT2y {V Bi — Vay V By} 
‘ 7] 1x 
(1 ~ ry) {1 4 (B.— 34 Bi 3 a 


2 
Ll rxy 





"Taye = pope 8 


reducing when the kurtosis of both distributions is zero to 


ud Vay (VB, — Vey v'B,} 


Txt 9 (Co Oigeetaeia ness (xxx), 


and vanishing for all symmetrical linearly correlated variates, including of course 
Gaussian systems. 


(8) To find the correlation between a deviation in a standard deviation and 
one in a coefficient of correlation. We have to multiply 6p. by 
Sry _ 8Pu_1 8y_1 BP 
Vay Pu 2 Pw 2 Por : 
We find 
1 a Psa — PoPu 1 Po — Dx? 


oor 1 Pe — Po Pra 
Vay Ma Tay Palau NP 2 Np» 2 NP ; 





Hence, assuming linearity of regression, we may put 
a, a a) 2 
Pa/Pu= o7°Bo, Dao] Poo = B.c;’, 
and approximately by the result immediately above Equation (xvii) 


Px2/ Dox — Po = Fx Try v( B.— 1) (By -1 ). 


Thus 

1 agtNB,—1 Cae Pease 

Tay : VN ak * N [a" (Bs be 1) es $e, (8, As 1) =. $077 1 xy V(Bs ie 1)(B2 me, 1)}. 
Tvs — . ae 

Or Frag (ta 2 Tit {Vv B.— 1-1 zy V By — 1}, 
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and by (xxi) Ja JAI 
Bz —1l-— Tey B.! a 


ies 
2 





S- sevns AED 


(1 — rn) {1-3 (B,—3 +4 B/—3) “. 7 
y 


VoYay 2 ty 
t 1-7, 


Since homoscedastic linear regression supposes 8, = 8,’ this result must be very 
close to 


is 1 
——— |/. oe a 2 + 
6 tyy = BP ey VB, — i/(1 —43(8.—3+8,’—38) i a ) oosvecl RMD 
, ~~ * wy 
for such distributions. 
For distributions in which the kurtosis is zero (8,=3) we reach 
ie 


Hae caorseos Met piaaasaek (xxxiii), 


Onl zy 


a result already reached for Gaussian distributions by Pearson and Filon in 1897 *. 
It is now shown to be true for all homoscedastic, mesokurtic systems with linear 
regression. 


(5) Two further probable errors are of interest. If we write 7,=ax2+b for 
the regression line, what are the probable errors of 


@=PzyFy/oz and b=¥ — Ty oyX/oz? 


It will be sufficient to give the values of o, and o, when the frequencies are 
symmetrical and the regression linear. In this case 


"a0, = "Joy = "Foy ="Go,=% TaZ="ey 
0 : 5 
' Tyyt . yy? ' Ney Fx . VyyFy /2 Tay» 1 OxFy Tay 


Writing tz=%/o,, we have 

da bey Pe Str Sax 
&@ Gy Ty G,' 
and 8b = 89 — a8 — Sf gy Cy Te — Vey Te Cy + AT2 Sox. 


Whence proceeding in the usual manner we deduce 


Cc. cae a a . ‘ 

ow=— V1 — ay [VN mes pinteden ee 
Cx : 

Oy = oy 1 — Py V1 + 72/VN = og VE +02 (xxxv) 
* is ai) aN + Og? .seeeeeee(XXXV). 


These enable us to determine any significant difference between two regression 
lines. 


We can go a stage further and ask what is the probable error of the mean of an 
array, Yz, aS found from the regression line. We have 
d9z = wda + 8b. 


* Phil. Trans. Vol. 191, A, p. 242. 
+ Cf. Phil. Trans. Vol. 191, A, p. 245. 
Biometrika 1x 2 
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We require accordingly o,oy7q,. Multiply the above expressions for da and 8b 
together, summing and dividing by the number of samples we find 





CaFr"ad = — 7 CE OOD crvascnisccencavesiod (xxxvi) 
x 
V1 —9r (a — %)*) 4 
Whence we deduce on = : {1 oo ERRich Gh exaswaKien XXXVli). 
Yu VN a Cx ( ) 


We note the increase of inaccuracy ot the means of arrays far from the mean of the 
whole population *. 


The above equations embrace the chief results for the probable errors and 
errors correlations of systems of two variables. They have been reached inde- 
pendently of any system of Gaussian distribution. 


* Results (xxxv), (xxxvi) and (xxxvii) are published here for the first time. 

















THE RELATIONSHIP BETWEEN THE WEIGHT OF THE 
SEED PLANTED AND THE CHARACTERISTICS OF 
THE PLANT > RODUCED. I*. 


By J. ARTHUR HARRIS, Ph.D., Carnegie Institution of Washington. 


I. IntTropucrory REMARKS. 


IN practical agriculture the quality of the seed planted is universally recognized 
as of fundamental importance. Three requisites are essential: freedom from 
noxious impurities, purity of breed and viability. All three of these points have 
attracted close attention, and have a voluminous literature. But given seeds 
belonging to the required variety, free from undesirable impurities and germinating 
successfully, it seems of theoretical interest at least, and perhaps of much practical 
importance as well, to ascertain the degree of relationship between the size of the 
seed planted and the characteristics of the individual in+~ which it develops. 


This question although by no means so extensively .. cussed as the others has 
received considerable attention. Much has been written concerning the desirability 
of winnowing seed to remove the lighter grains. A review of the literature of 
this field would lead us too far from our immediate purpose, which is to present 
the data derived frora some rather extensive quantitative investigations. 

It will not, however, be unprofitable to call attention to certain general 
deficiencies of the previous work, especially since this will define the point of view 
directing the studies described here. 

(1) In many cases the distinction between perfectly matured but small seeds 
and potentially large but immature, blighted or shrivelled seeds has been dis- 
regarded. 

(2) The method of grading the seed has, generally speaking, been neither 
uniform nor logical. Usually the separation has been only into “heavy” and 


* This first paper is limited to the presentation of the data for number of pods per plant in twenty 
series of garden beans belonging to three different varieties. Much material for other varieties is in 
hand. The constants for other characters, e.g. number of ovules and number of seeds per pod and 
seed weight, are being calculated and will be presented later with more general discussions. 


2—2 
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“light,” or at most into “heavy,” “medium” and “light.” The meaning of such 
crudely defined terms of course differs from experiment to experiment. They 
may, with certain limitations, enable one to say which class of seeds gives the 
best results. They do not permit comparisons of the advantages to be gained by 
seed selection in different varieties. They do not allow of the writing of a general 
formula enabling one to predict the yield from seed of given weight. Yet such a 
formula is precisely what is needed. The advantage of sorting seed—if there be 
an advantage—depends not only upon the increased yield (or increased uniformity 
of the crop in certain cases) but also upon the cost of carrying out such a selection. 
In deciding upon the stringency which is profitable in the seed selection, the 
practical breeder should know the exact weight to be attached to each factor. 


(3) The experiments have not been carried out in a way to make possible 
the calculation of the proba»le errors of the results. In many cases the experi- 
ments have been small, and the conclusions are open to serious question, both on 
the ground of possible experimental errors and on the ground of the probable 
errors of random sampling. 


All of these difficulties can be overcome by the application of the modern 
statistical methods to the problems. A wide series of such biometric constants 
deduced from carefully conducted physiological investigations ought to be of great 
service to the man dealing with the practical problems of agriculture. 


II. MATERIAL AND METHODS. 


The materials for this study are drawn from three varieties of garden beans— 
the White Flageolet, White Navy and Ne Plus Ultra. Altogether there are 
twenty series grown under most diverse environmental conditions, Their history 
has already been given* for another purpose and need not be repeated here. The 
symbols used to designate the different lots are the same in both papers; hence 
the reader may make any comparisons which he sees fit. 


All of the seeds planted were, as far as could be determined by inspection, 
perfect in form and development. Each was individually weighed and classified 
in a uniformly graduated scale. These individually weighed and individually 
labelled seeds were then mixed and planted at random in rows by varieties, the 
rows in their turn being scattered over the field, more or less at random, in order 
to counteract by chance distribution any influence of the possible heterogeneity of 
the substratum upon the characters of the plants. All the determinations of 
number of pods produced were made on individual plants. 


Thus the probable errors for the statistical constants for weight of seed 
planted, number of pods per plant, and for the correlation between the weight of 


* Harris, J. Arthur: “A First Study of the Influence of the Starvation of the Ascendants upon the 
Characteristics of the Descendants.” Amer. Nat., 1912. In press, 
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the seed planted and number of pods per plant, can be determined. The corre- 
lations between weight of seed planted and number of pods produced are com- 
parable from variety to variety é6r from cultural condition to cultural condition. 


To render the results more intelligible, the straight line regression equations 
have also been calculated and represented in a series of diagrams. These show by 
the slope of the lines the (smoothed) change in mean number of pods per plant 
associated with changes in the weight of the seed planted. 


The reader unacquainted with higher statistical methods need only remember 
that the coefficient of correlation describes the degree of interdependence between 
two variables on a scale of —1 to +1. This measure cf interdependence is, there- 
fore, quite independent of the magnitude and of the variability of either or both 
of the characters in question. The regression coefficient, on the other hand, shows 
the absolute amount of change in a second character y consequent upon a change 
of one unit on the scale of the first character z Concretely, in our present case, 
the regression coefficient shows the absolute increase (or decrease) in number of 
pods per plant associated with an increase (or decrease) of one unit in the weight 
of the seed planted. “Increase” or “decrease” 
condition in the population as a whole. 


is measured from the average 


The correlation coefficient is fully justified as a measure of interdependence 
only when regression is linear, that is to say, when the mean value of y increases 
at a uniform rate throughout the whole range of a Where regression is not 
strictly linear, the coefficient of correlation still furnishes in many cases a very 
satisfactory measure of the intensity of relationship between two variables. This 
is true in the present case. 


All the weighings were made on seeds which had dried for several months at 
laboratory temperature. Drying at high temperatures was of course precluded by 
the fact that the seeds were to be used for planting. Drying in a vacuum over 
sulphuric acid could not be undertaken because of the excessive labour involved 
where each seed had to be followed individually throughout the whole work. 
The weight unit adopted was ‘025 gram. Hence to obtain means and standard 
deviations of weights in grams deduct ‘5 from values in tables and multiply by 025. 


The correlation tables showing the relationship between the weight of seed 
planted and the number of pods produced are entirely too bulky for publication, 
It is pe-sible, however, to present the essential data by showing the total number 
of pods produced by each grade of seed weight (Tables III—VI). A convenient 
method of calculation for such cases has been suggested elsewhere*. 


In deducing the correlations from such tables, the means and standard devia- 
tions for the two characters involved are requird. The distributions of numbers 
of pods per plant for the twenty series have already been published} for a quite 

* Harris, J. Arthur: “The Arithmetic of the Product Moment Method of Calculating the Coefficient 


of Correlation.” Amer. Nat. Vol. xu1v. pp. 693—699, 1910. 
+ Amer. Nat. 1912. In press. 











14 Injluence of Weight of Seed on Plant 


different purpose. The tables are very large and need not be given here. The 
physical constants deduced from them appear in Table I. The distribution of the 
seed weight is shown in the frequency columns of the reduced correlation tables. 
The physical constants* for seed weight are given in Table II. 


III. AwNAtysis or DATA. 


1. Number of pods per plant in Navy, White Flageolet and Ne Plus Ultra t. 


From Tables III to VI the value of the rough product moment = (w’ p’) about 
0 as origin may be calculated straight away by multiplying up the total pods by 
the number of the weight class (in parentheses) and summing. The coefficient of 
correlation, 7, is then deduced from the formula 
= (w' p’)|N — wp 


Twp = 
4 Twp 


while the equation for the regression straight line is given by 


p= (p tin ws) Tp 2 w 
| a ee ho we ewe 
Tw Tw 


where p and w represent weight of seeds and pods per plant, the bars denote the 
population means of the respective characters, and the sigmas their standard 
deviations. The variable p is integral and there is no need for grouping; the 
unit of w is ‘025 gram, with class 1 ranging from 0 to ‘025, and centered at 
0125 gram. 


The correlation coefficients and regression equations are given in Table VII. 


The straight line regression equations being available for all series, it is only 
necessary to determine the empirical means for number of pods per plant to test 
by graphical methods the linearity of the regression of pods per plant on weight 
of seed planted. 


It is not feasible—because of the rather great labour demanded, and the com- 
plexity of the diagrams—to do this for all the series. In the diagrams (Figures 
1—38) the slope of all the twenty lines given by the equations is shown. In the 
case of certain of these lines, selected quite at random, the values of the empirical 
means are indicated in the usual way. 


The empirical means are scattered with some irregularity and from inspection 
alone one might suspect that regression is not strictly lineart. In other words, 


* Sheppard’s correction was applied in calculating the moments for seed weight, but not in obtaining 
those for number of pods per plant, since the latter varies discretely. 

+ Other series of material, and the relationships for seed weight and characters other than number 
of pods are being reduced. 

+t The sensitiveness of the number of pods per plant to environmental conditions (and consequently 
the great variability in the means, especially near the ends of the range of seed weight) is so great that 
it has seemed inadvisable to attempt more refined mathematical treatment of the problem of linearity 
of regression. I hope to do this later on larger series of material. 
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18 Influence of Weight of Seed on Plant 


the mean number of pods may not increase at the same rate from the lowest to 
the highest, the lightest to the heaviest, grade of seeds. Practically speaking they 
may be considered to do so. 


The coefficients of correlation are rather small, ranging from about *120 to 
‘280. The average of the twenty series is ‘1615. It may be noted, however, that 
they are in every instance positive, thus indicating that the selection of larger 
seeds will give a somewhat higher yield of pods. 


The second term of the regression equation enables us to read off at once the 
increase in the number of pods per plant to be secured by selecting seeds one unit 
(ie. 25 mg.) or more above the average. These values range widely, as is to be 
expected from the fact that the crops upon which they were calculated were pur- 
posely grown under most diverse cultural conditions. The slopes of the regression 
lines will make clear to the eye the advantages to be secured by planting heavier 
seed. 


Cotp Sprrnc Harsor, U.S.A., 
January 29, 1912. 


TABLE I. 
Pods per Plant. 





| 








} a Total ei seal Standard Deviation | Coefficient of Variation 
=" Plants Probable Error and and 
| Probable Error Probable Error 
en oe a ee Ee ee | 
NHH ... | 1484 | 16:99 +15 8°67 +11 51024078 | 
NHHE ... 1271 | 11°93+-10 5°17 +07 43°29 + 0°68 
NHD_... | 1416 3°97 + 03 194+ -02 48°97+0°'76 | 
NHDD ... | 1204 4°58 + ‘05 2°38 + 03 51°84 +0°88 | 
Bpp  .., 513 3°59 + ‘06 1:89 + ‘04 52°66 + 1°38 
NDDD ... 459 4:41 + +06 1:93 + 04 43°86+1°15 
NOH... | 670 | 14°62+-21 8°24+°15 56°39 + 1°33 
NDHH ... | 565 | 11°83+°14 4°96 +°10 41°94+0°98 | 
FSS + | 868 | 1503417 7414-12 49°34+0°97 
FSC és. pe 1422 + ‘21 7°38+°15 51°9041°27 
FSH... | 475 | 1729425 7944°17 | 45°8941:20 | 
SHH ... | 429 | 11°84+-16 4°80+°11 40°50 + 1°08 
FSD... | 428 | 343+ -06 1°69 4-04 49°42+1°39 | 
‘SDD... | 387 | 4-044 -06 1°73 4°04 42°85 + 1°22 
USS nes 680 | 15°74+°16 6°04+°11 38°37 0°80 
USC a3 530 | 10°14+°13 4:38 + ‘09 43°23 + 1-05 
USH .... 361 | 14:04+-20 5°55 +°14 39°56 41°14 
USHH ... 224 8:44+°15 3°24+°10 38°45 + 1°25 
‘SD ts 312 | 2594-05 1°23 + 03 47°30+1°54 
USDD 237 | 3°62+-09 2°10 + ‘07 57°97 £2°32 | 
eb LaF | | 
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TABLE II. 
Seed Weights in Working Scale. 











| 
. Total Mean and 
| Series Seeds Probable Error 
- | 
| 
NHH 1484 | 9°44+ 03 
NHHI 1271 | 9°27 + 03 
NHD 1416 | 9°46 + 03 
NHDD 1204 8°46 + 03 
NDD 513 7°73 +04 
VDDD 459 8°76 + 04 
NDH 670 7°62+ °03 
NDHH 565 8°93 + °04 
FSS 868 | 8204-03 
| FSC 586 831+ 04 
FSH | 475 | 8°33 + °04 
FSHH 429 | 8°48+°04 | 
FSD 428 | 8:22+-05 | 
| FSDD 387 | 719+°04 | 
USS 680 | 14:14+:°07 
SC 530 14°50 + °08 
USH 361 14°33 + ‘09 
USHH 224 3°94+ 09 
USD 312 14°39 + -10 
USDD 237 10°85 + ‘08 


Weight of 
Seed Planted 


050—"075 ( 3 
-075—'100 ( 4 
*100—'125 ( 5 


Series NHH 


F Total Pods F 


oe 
peng Ss me ig 
| 


3 5 
3 36 
23 306 


TABLE 


| Series NHHH 


Total Pods F 


Standard Deviation 
and 
Probable Error 


ho = bobo Ro 
* >< > or ra * « * 

oO 

& 

} 

& 


IIl. 





Series NHD 


Total Pods 





Coefficient of Variation 
and | 
Probable Error } 


15-96 + ‘20 
14:36 + -20 
15°93 +21 
16°66 + °24 
17°05 + °37 
14°89 + °34 
16°83 + °32 
13°93 + -28 
17°48 + ‘29 
16°81 + °34 
17°28 + °39 


Series NHDD 
F | Total Peds 
| 


DS oe, 2 2 4 

—}; — |j—] — 3 | 9 

— <=. ch (a0 10 14 | 54 

10 89 18 57 80 | 350 

95 1015 89 294 230 1017 

| 265 2932 276 1019 319 1382 

| 376 4583 386 1559 285 | 1317 

308 | 3701 298 1235 157 782 

151 1959 211 860 80 | ~ 401 

| 51 668 95 413 25| 153 

8 106 28 117 6 29 

4 63 8 40 3 19 

io 39 3 13 —}; — 
| 1 9 zs = aps 

1271 | 15164 1416 5619 1204 5517 





150—"175 (7) | 96 

175—200 ( 8) | 281 | 4102 
200—"225( 9)| 401 | 6614 
225-—+250 (10) | 330 | 6200 
250—"275 (11) | 216 | 4040 
275—"300(12)| 94] 1734 
300—*325 (13) 25 545 
325—'350(14)| 9| 119 
360—"375 (15)| 5 76 
375—*400 (16) | _— — 
Totals | 1484 | 25216 

| 














Totals ve 513 


Series NDD 


Weight of 
Seed Planted | 
| F 
—|——|—— 
*050—*075 ( 3) — 
*O75—*100 ( 4) 3 
*100—°125 ( 5)| 25 
*125—°150 ( 6) 53 
"150—"175 ( 7) | 143 
*175—°200 ( 8) | 151 
200—°2235 ( 9) 87 
*225—°250(10)| 42 
250—*275 (11) 8 
275—*300 (12) | l 
400—-*325 (13) — 
*325—°350 (14) —- 
350—*375 (15) 
*3T5—°*400 (16) 


TABLE IV. 


Series NDDD_ | 


| Total Pods F | Total Pods F 
pees ast e 1 
6 2 | 4 8 
65 4 | 17 26 
152 15 47 76 
181 44 195 198 
564 124 524 2039 
341 147 616 104 
175 87 420 37 
5d 27 140 | 9 
4 6 42 | @ 

— 3 18 — 
aan rt pa 

_ a . 

| 

1843 459 | 2023 670 





TABLE YV. 
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Series NDH 


Series NDHH 





Total Pods F 
ee 
) a eee | 
326 | 1 
Si | -3 
2653 | 42 
3103 | 160 
1890 | 164 
641 | 130 
188 42 
49 | 9 
9794 | 565 





17 
116 
427 

1847 
1963 
1613 
546 
123 
28 


Total Pods | 








Series F'SS 

Weight of 
Seed Planted « rages 
Total 


| 7 

F | Pods 

050—‘075 ( 3) —_— — 
075—'100 ( 4) 6 96 
*100—‘125 ( 5) 21 275 
*125—*150 ¢ 6) 72 1002 
*150—'175 ( 7) | 170 2432 
175—°200 ( 8) | 241 3566 
200—*225 ( 9) | 194 3065 
*225—*250 (10) | 118 1827 
250—*275 (11) 36 627 
275—*300 (12) 10 153 

300—*325 (13) | — — 
Totals 868 13043 


| 586 


Series FSC 
s 
| Total 





. ’ Total 
F | Pods | ¥ | Pods | 2 
1 a ie — = 
l 5 2 13 1 
15| 173 9| 139 g 
40) 463 | 43| 652 | 13 
98 | 1238 | 80/| 1369 | 43 
168 | 2355 | 116 | 2049 | 166 
154 | 2427 | 131 | 2265 | 131 
72 | 1164 | 64] 1207 | °60 
31| 420 | 24| 414 | 10 
6 69 5 90 | 2 
= | nin 1 17 —s 
« fo en — 5 
8334 | 475 8215 | 429 
| 








5080 


| 





. } T otal 

I Pods 
2 4 

3 9 
10 30 
25 63 
7 252 
138 498 
102 364 
43 152 
20 57 

7) 8 
| 428 1466 


387 | 1562 
| 








Series FSH | Series FSHH | Series FSD | Series FSDD 


Total 
Pods 
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Weight of 
Seed Planted 


Series USS 








$626 —* 
550— 575 
‘575—*600 


‘075—"100 ( 4) 
*100—*125 ( 5) 
‘125—-150 ( 6) 
‘150—*175 ( 7) 
*175—*200 ( 8) 
"200—+225 ( 9) 
*225—+250 (10) 
*250—"275 (11) 
‘275 —+300 (12) 
"300-—*325 (13) 
“325—+350 (14) 
*350—'375 (15) 
‘375 —+400 (16) 
400 —*425 (17) 
*425—450 (18) 


*450—'475 (19) 
‘475—*500 (20) 
500—‘525 (21) 





we 


Totals 


, | Total | 
F | Pods | 
—|-— 
eee eae 
— Silent | 
a Eee 
8 98 
18 236 
57 943 
| 118 1745 
124) 1924 
| 117 





NHH 


Series USC 





530 


19 295 
23 | 402 
16 | 323 
13 | 246 
6 94 
3| 87 
2 48 
| 
680 | 10702 


| 
| 
































TABLE VI. 
| 
Series USH | Series USHH | Series USD | Series JSDD 
| & 
Total ra Total F Total PF | Total 5 Total 
Pods Pods Pods Pods F Pods 
| 
| | Bee: Sas 
5 | — | ae Ee RA > | cnet <P pg eet 
obs ey, We — aS ae as 
20 oe Bo ee oe 2 7 
= | t— +E 2i aS es | 5 17 
7) 2) te bh PS ee eee 
27 1 Te ieee 5} ll | 31 91 
42 10 128 5 28 3 7 | 49 161 
252 | 97 336 | 20| 176 16 | 32 | 61 | 216 
543 | 431 536 | 96) 180 48|°114 | 34 | 155 
916 | 73! 1020 | 43) 346 55 | 135 | 92 | 93 
1120 | 56| 828 | 49) 434 60| 163 | 14 | 55 
805 | 61 S45 | 36 323 53 | 142 | 5 | 95 
616 | 28| 445 | 26| 235 | 19| 47 | 1| 5 
284 15|/ 192] 8 76 14| 39 eer TAG ee 
236 | 12] 203 | 6| 47 “ERS SS Bg aes 
216 | 11] 141 | 4 31 12 31 2. ue 
151 | 8| 133 1 14 i ae eee = 
92 | 11| 174 | uk ee Cs Poe 
Sea ee ee ee ee ee 
“3 eee eee i el Sp 
= | 1} 18 | - ey Pa.5 oe 
| | | 
ext alan 2 bed lemon es ee! oe) 
5376 | 361 | 5069 | 224 1890 | 312 809 e | 858 





TABLE VII. 
Number of Pods and Weight of Seed in Working Scale 


Coefficient of 


Correlation and 


Probable Error 


Regression Straight 
Line Equation 





177+ °017 
145+ 019 
129+ °018 
121+ °019 
“282 + -027 
215 + *030 
‘258 + -024 
"152 + 028 
098 + °023 
"147 + 027 
“100+ 031 
“121+ 032 
"130 + 032 
144+ 034 
“155 + 025 
"150 + 029 
“129 + -035 


“195 + 037 
241+ ‘041 





p= 
e~ 
p= 


p= 


*383+1°017 w 
‘719 +0°562 w 
*396 +0°166 w 
"862 +0°203 w 
‘469 +0°404 w 
*618+0°319 w 
"982+ 1°657 w 
*412+0°606 w 
"856 +0°508 w 
‘787+0°774w 
*692+-0°553 w 
*256+-0°541 w 
*164+0°153 w 
2°328 +0°238 w 


owas! 


wa WwOaQne 


p 
p=10°876+0°344 w 


6°453 +0°255 w 


p=10°165+0°271 w 


p= 
p= 
p= 





5°076+0°241 w 
1°287+0°091 w 
*535 +0°284 w 


of Weights. 
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ON THE PROBABLE ERROR OF A COEFFICIENT OF 
CORRELATION AS FOUND FROM A FOURFOLD 
TABLE. 


By KARL PEARSON, F.R.S. 


Let the fourfold table be 


a b 





c | d e+d 
ate b+d| WN 





Then on the assumption that the frequency distribution is normal, we can by aid 
of Everitt's Tables of the Tetrachoric Functions* rapidly find vr. I have shown in 
a paper published in the Phil. Trans. in 1900+ that found in this way 


Probable error of r 


_ 67449 \* +d)(c +b) 


mh ,(a +c)(d+b) 
VN xX 4.N2 ; 


,(a+b)(d +e) 


N 


vd — be ab — cd c— bd)? 
+ Whips‘ INE —fr N —y," NE Jett) 


+h 


1 (& 152 l Bo 2 
where y= e~ 2? dz _=—— [ e~ "dz, 
— 
a= bark g,- kath 
Yee —>; eG —s : 
V1 —17 V1 — 9 
x 4 
] 1 -35 1-7 (1? + k? — 2rhk) 
0 
Qr V1 — 7 ‘ 
* Biometrika, Vol. vu, p. 436, and Vol. vit, p. 385. i 
+ Phil. Trans. Vol. 195 A, p. 14, Owing to the carelessness of the printers my xo was put as Vx0 . 


and the last N? in the denominator as N,. 
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and h and k have their usual meaning defined by the integrals 


e. h 
(a +e)—-(b +d) = as e~ "dz =}a,, say ; 
0 


2N N Qr 
(a+b)-—(c+d) 1 ig — 422 
SE ——— a a. = 1 29 Say. 
2N Qa *y Bi ind 
Let H= 1 - K= nwg- as usual. 
20 \ Qa 


Now the formula (i) above for the probable error of r is admittedly laborious 
in use. I have tried in many ways, while retaining its full accuracy, to throw it 
into a form involving less laborious calculations; I have not succeeded, however, 
in achieving any sensible reduction in its complexity, as long as I maintain its 
complete generality. 


Although many hundred fourfold tables have now been published, many of which 
give such small correlations that their true significance can only be settled by 
a knowledge of their probable errors, yet I find only 40 to 50 probable errors have 
so far been determined. This matter seems so regrettable that I have sought for 
a fairly easy method of determining a closely empirical expression for the probable 
error of r which is likely to be of service, and can be adapted easily to tables. 


I consider first two extreme cases. If h and k are both zero, or the fourfold 
division at the mean, then y, = y,=0*, 


Probable error of r 


_ 67449 Ie V1—r* ((a +d) (b+ m _ 67449 V1 — 9? pee 4 
ee VN [  =— iP 


VN 2 
since in this case a=d, and b=c. 


Nj’ 


But for a division at the mean by Sheppard’s Theorem 


* = COS b =sin (5 mb 
ae 2 rae 


or (sin r)/$a = (a — b)/(a+ b). 
sina r\2 4ab 16ab 
Hence = 9 4dr ) ~(a+bp N° 
Substituting we have: 
Probable error of r= — : V¥l—r J1 “ ") ae (ii), 


if the angle of the inverse sine be read in degrees. 
Again if r=0, the probable error of 7 may be obtained from (i) whatever the 
values of h and k. For in this case 
ad—be=0, Wi=}a,, Wr=ha, y= HK. 
* Phil. Trans. Vol. 192 A, p. 141 and Vol. 195 A, p. 7. 
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We have (b+d)/N=}(1—-a@), (a+c)/N=}(1+4), 
(a+ b)/N =4(1+4), (¢c+d)/N=}3(1—4), 


a+d _(a+b)(a+e) 4 ad—be , (c+d)(b+d) ad — be 
Vv » We + 


=4(1 +a,)(1 + a) + 4(1 —%)(1 — %) = 4(1 + m0), 
since ad — bc = 0 in the original population. 


c+b _ 


Similarly : sate 4 (1 — aa). 
ab—cd a(N-—a-—c-d)-cd 
— N? 
a_(a+c)(a+d) 
ae N? 
=}(1+a,)(1+,) —4(14+a,)(1 + am) 
= ta, (1 am a"), 
eee wc — bd . 
and similarly : == da, (1 — a,?). 


Hence substituting in (1) 
‘67449 


VNHK aig (1 — 2a") + gaa? (L — on?) + a'gan? (1 — 02”) 


Probable error of r= 
— La,?(1 — a,) — 4a,?(1 — a,2)}2 
67449 a re 
= NHK Ok RCM iiiaiencbracioenretiness (iii). 
This can also be put in the form: 
67449 /(a+b)(a+c)(d+b)(d +c) a 
VNHK oo ete ~ NA Pm. oe her Pen (iv). 


This is the probable error of r of a fourfold table when the real value of r is zero. 


Probable error of 7 = 


Now as (ii) and (iv) give the reducing factors for the two cases (a) when h and 
k are both zero but 7 has any value and (b) when / and & have any values but r is 
zero, it occurred to me that the combined product of the two would give good 
results for a considerable range of values of h andk andr. We have to note that 
(iv) for h and k zero becomes 
67449 a 
VN 2 
Hence we take as our formula: 


Probable error of r 


- Vi sin” 4 VE (L+%)$(1—%) V5 (1+ a4) $ (1 — @) 


Fal (op 


eee 3 8 


at 


H 








Arte niall 








| 
| 
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Now it will be seen that this consists of three parts : 





(a) V1—-7 J - (Se =}. This is easy to table for all values of r. 


90°, 
LT; Orne epee 

(b) V$(1 — a) i 
V$(1 +a) $(1 — oy) 
(c) RK : 


Both these (b) and (c) can be readily found from a single table rapidly formed from 
Sheppard’s Table of the Probability Integral. The entry to the single table will 
be (a+c)/N or (a +b)/N, i.e. $(1 + @). 

Thus a knowledge of the correlation r and the two division percentages (together 
with Miss Gibson’s Table for -67449/VN), will enable us by the aid of the two 
new tables to rapidly write down four factors whose product gives the required 
probable error. I have tested the form (v) against the true probable error as found 
from (i). In all cases it gave results differing only from the true value at most by 
about one or two units in the third place of figures—a result amply accurate for 
all practical purposes. 

Tilustration I. 


211°25 153°75 365 
15275 560°25 713 
364 | 714 1078 


The correlation was found to be ‘5557 + ‘0261; the probable error from the short 
formula was ‘0265. 


Illustration IT. 


1562 42 | 1604 
383 | 94 | 477 
1945 | 136 | 2081 


The correlation was found to be °5954 + ‘0272; the probable error from the short 
formula was ‘0293. 
Illustration ITT. 


455 | 622 | 1077 
599 | 1324 1923 
1054 | 1946 | 3000 


The correlation was found to be "1811 + 0210; the probable error from the short 
formula was ‘0199. 
Illustration IV. 
849 665 1514 


| | 
205 | 1281 | 1486 
1054 1946 | 3000 
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Lhe correlation was found to be ‘6633 + ‘0132; the probable error from the short 
formula was ‘0132. 
Illustration V. 
1196 223 1419 
318 1263 1581 


1514 | 1486 | 3000 | 
The correlation was found to be ‘8464 + ‘0079; the probable error from the short 
formula was 0079. 

These examples will suffice, I think, to give confidence in the formula and in 
the tables accompanying this paper. The absence of probable errors from the 
expressions for fourfold table correlations can no longer be justified on the ground 
of their great laboriousness. 

The following Tables have been calculated by Miss Julia Bell. 


Let y,='67449/VN. This is given by Miss Gibson’s Tables, Biometrika, 


Vol. 111. p. 387. Let . 
le oats pos on eee 
8, /1-f 
x \ 90° | 











and Xe =z VE(1 +4) x $(1 —a). 
Then Probable error of 7 = x1. yr. Xa, « Xas- 
TABLE I. Values of y, for Values of r. 
r Xr ies ae r Xr r Xr r | Xr | 
boc. sae = 
| | | | 
00 10000 "20 | ‘9717 ‘40 | *8845 60 "7298 "80 | +4843 | 
“01 “9999 ‘21 | ‘9688 “41 "8785 “61 *7200 ‘81 | *4687 | 
‘02 ‘9997 | ‘22 | 9657 | -42 | -8723 | 62 | “7099 | ‘82 | -4526 | 
03 | -9994 | -23 | -9625 | -43 | -8659 | “63 | ‘6997 | ‘83 | -4362 
04 | 9989 | “24 | 9591 | -44 | “8594 | “64 | “6892 | “84 | 4192 
05 | 9982 | -25 | 9556 | -yo | -8527 | -65 | -6785 | -s5 | -4018 
‘06 ‘9975 ‘26 “9520 “46 "8458 ‘66 “6675 ‘86 3838 
‘oy | 9966 | -27 | 9482 | -47 | -8388 67 | -6563 | ‘87 | -3652 
08 | 9955 | -28 | 9442 | ys | ‘8315 68 | 6448 | “88 | -3461 
‘09 9943 | -29 | ‘9401 | -49 | -8241 | “69 | -6331 | ‘89 | -3262 | 
‘10 9930 | “30 | ‘9358 | 50 | ‘8165 | ‘70 | -6211 | -90 | +3057 
‘11 ‘9915 | “31 | *9314 | 51 | 8087 | “72 | ‘6088 | ‘91 | 2843 | 
12 ‘9899 "82 “9268 ‘62 | +8007 72 *5962 "92 *2620 | 
13 ‘9881 | ‘33 | ‘9221 | -53 | -7996 | -73 | ‘5834 | ‘93 | -2387 
‘Li 9862 | “34 | ‘9172 | “54 | 7842 | “74 | “5702 | “94 | “2142 | 
| 
| +15 ‘9841 | 35 | 9122 | 55 | -7756 | ‘75 | -5568 | 95 | 1882 
| +16 ‘9819 | “36 | 9070 | 56 | ‘7669 | “76 | +5430 | -96 | +1605 
| 17 | 9796 | “37 | -9016 | sv | -7579 | -77 | -5988 | -97 | -1305 | 
18 | ‘9771 | °38 | ‘8961 | 58 | “7488 | “78 | “5144 | -98 | -0972 | 
19 "9745 39 "8904 69 "7394 79 “4995 ‘99 | °0585 | 
1:00 
| 
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TABLE II. 
Values of x. for Values of 4(1+a). 

















‘ h(l+a)/ Xq 4 (1+a) Xa 4 (1 +a) Xa B(l+a)) Xa 

BRS SEEN. eee Ce | 

/ | 

; 50 | 12533 65 | 1°2877 80 -| 1°4288 95 | 2°1132 

; 51 1°2535 “66 | 1:°2928 eI 1°4457 96 | 2°2740 
52 1°2539 ‘67 | 1/2984 “82 1°4641 ‘97 2°5071 
53 1°2546 68 | 13044 83 1°4844 ‘98 | 28915 
O4 1°2556 69 1°3109 8h | 1°5067 ‘985 | 3°2097 

| 
55 1°2569 vo | 1°3180 85 =| 1°6315 “990 3°7333 
‘56 1°2585 71 | 1°3256 ‘86 | 1°5590 991 3°8854 
57 1°2604 a 1°3338 ‘87 =| 1°3897 “992 4°0639 
58 12626 73 1°3427 88 1°6245 993 | 4:2784 
59 1°2652 V4 1°3523 89 1°6640 "994 | 4°5419 
| | 

60 1-2680 15 1°3626 ‘90 1°7094 ‘995 | 4°8779 
61 1°2712 ‘16 1°3738 ‘91 | 1°7623 996 | 5°3278 
62 1:2748 fj 1°3859 92 1°8249 ‘997 | 59776 
63 12787 “78 1°3990 93 19003 ‘998 | 7°0465 
C4 1-2830 “79 1°4133 7 “94 1°9937 ‘999 | 9°3870 





























MULTIPT-E CASES OF DISEASE IN THE 
SAME HOUSE. 


APPENDIX TO PAPERS IN BIOMETRIKA, Vou. vit. 
p. 404 AND p. 430. 


By KARL PEARSON, F.R.S. 


I REGRET that a most careless algebraic slip has crept into my work on this 
subject. It stared me in the face when I saw the published number of Biometrika, 
and I cannot understand how it escaped me in MS. or proof. 


After obtaining the fundamental equations 


StDe Pt P 
Tp, 7 ppp, = — —Yetiteeetenreeeeeeteerees cee (xvi), 
“ Ps 8 
o*) = Ds Ul s ) vniie's qincecsoshegt eoeh reveal (xvii), 
- n ~_ 
where N=n (1 + | Gov jetinn scored comes eekced (xviii), 


on p. 410, I continued on p. 411: 


“Now let us write 2,=s*p,; then clearly o*, =8’o", ....” 
¢ us Ps 


,, and thus the value of x, i.e. 


Clearly nothing is 


2 


more false! o*, would then equal s‘o*, and not s*o 
8 a 


I z 
8 . (Ps PP) 
1 Ps ) 


is erroneous. 

It is not possible I think to obtain the value of x’ in this indirect and brief 
method. We must return to the multiple correlation formulae of my memoir in 
the Phil. Mag., July, 1900, p. 161, and evaluate the determinant R and its minors 
for the special values (xvi) and (xvii). 

Following the lines of that paper let us write: 

SEE HUN nwltivesg tn teeautedipeecssnnnee (xix), 


rT 2 * 6 
,  Ncos? B, sin? 
then a, = - Bs 


and oy = fe oY 5" eer ey aye (xx). 





Ae ha ie RE Si a 


ots 
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Hence from (xvi): 


N cos 8; sin 8; cos B, sin Bs ___ st N*sin® B, sin? , 
t °s PtPs 2 N , 
or Ry», POR COR Be «in cosasas sateicnuas esos aten (xxi), 


and agrees entirely with the value of m, used in my Phil. Mag. paper just 
referred to. 


In other words the determinant R and its minors R,; and Ry will take precisely 
the same forms as in that paper provided we 


(i) replace Eq. (xi) of that paper by our result from (xix) above, i.e. 


Ns = cot? B, - 5 ie . done ethan cease (xxii), 
and 
(ii) remember that we must reduce the total number of our variants 
Po> Prs +++ Du 
by the two relations S(ps)=m, BGA OG Savon ctincetirees pai (xxiii), 


where m is the total number of houses and n of cases. That is to say our total 
number of frequency groups, which are not wholly dependent but are correlated, is 


wt) —-2=u-1. 


‘Ve select as our dependent variates, which are fixed as soon as the others are 
known, p, and p,. Thus the value of the auxiliary determinant J, p. 162 of my 
Phil. Mag. memoir, is: 


clk | — 1 1 1 | 
Sa ei 
= 1 -—-»:; "34 
Bf Serge nates ipdite gate tiene oubaey 
PRs ee: | 
eee Eat eee 
If A =(1 +m) (1 + m2) (1 + 9) --- (L + mu-1), 
we find as in that memoir: 
1 1 1 4 : 
on f n ade ce, Ale aad a _— wi 
J =(- 1) r(1 a saa ic) Keaeeegios (xxiv), 
xr 
Ja =(—1)"" einen | Fit, ahi Mee eRe e None ThE eae XXVv), 
i" ireiiw — 
r 1 1 
selec Let oe Tie a as ae 
Ju=(- 1 Xe = ae 
1 1 1 : 
ee ae See xxvi). 
1 + Ns 1+ Ns+1 1 a ( ) 
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To evaluate these we note that 
w—t 1 = ae roe _ Sal (5, 8") 
8 (i, 2)= 5 Ginay='s FO, 


u 
but S(p,s*) = NV as shewn on p. 411 of my Biometrika memoir. Hence: 
1 


u-1 1 
8 (a)- 7 Pur’). 


Thus we have: J=(-1)"r ror, 


(—1)"7”r Put 
sot Ea (es or) 


(-1)"7 0 
(1+m,)(+m)’ 
Next as in the Phil. Mag. paper (p. 161): 
R=(-1)J)a, 

Ry, =(— 1)" J, cot? B,/r, 

Ry =(— 1)" Jz cot B, cot B,/r. 
Thus: R=p,w/N, 

Ry = cos? By ( pss? + Pyw?)/N, 

Ry = cos 8, sin B, cos B; sin B;. 


a= 


Finally we have: 


Ry, _ cos* B, (ys* + Pu’) 5 , 


Ro*), Put? cos? 8, Ds 

Le e 1 

ee ene a eee 
w\Du ps) Pu? Pz 

ii Ry cos 8, sin 8, cos B; sin B; st 
= a x =F : ; 
Roy op, Pur?/N N cos f, sin 8, cos f; sin By 

_i1st 

U Dy 


Thus finally y? is given by 


ate ae R. 
S Ss oe De 2 9; ul 8 fee H. _ x 
1 Ro*,. ape a9 Roy op, (Pe — Pa) (Pe — Be) 
u-1 (7 gs? l ) { st ) 
S i —7.~\ 499) _s _zxyl 
1 (\De uw +5, ) (Ps — Ps) j sia lw pu (Pe — Pe) (Pr Pr) 
u—1 Lae ay l u-l 
= S (Ps — Psy, - ) S s (ps — Ds dy. 
1 Ds WPu (1 
But s 8 (ps — Ps) = 9, 
1 


therefore Ss (ps 7 Ds) == © (Pu ag Pu)- 
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u —_7.¥ 
Thus we have: Y= \ — Pr) | : 
1 Ps 

In other words x? is to be found from the ordinary x? for the u+1 variates 
excluding the houses with zero cases. I had excluded these houses from my 
weighting formula, ie. 

= {EPs PP 

ps =) 
which was, however, in error. The higher multiple cases are now we see not 
heavily weighted, but we are not to use the frequency of the zero case houses in 
evaluating our y*. This is the effect of the double relation between the u+1 p’s. 
An additional p is cast out of our result as compared with the ordinary frequency 
problem of goodness of fit. 


I have next to consider how far this correction modifies the results obtained 
in my papers for numerical cases. 


For enteric cases we have as on p. 412 of my Biometrika paper 


3398 +7 1 
= 678 + 8643 + 1 = 10°321, 


, _ (3398 — 3350), (78 — 56)" | (2-1) 


hence we have for n’=3, P=-006, or the odds are about 166 to 1 against such 
a large divergence from chance, 


For cancer cases we have from the returns on p. 432 
Asay q (94) | (416) (914) 
x 3124 20 184 " 0386 
= ‘022 + 4418 + 9°405 + 9°714 = 23°559. 





For n’=4, we have P=-0003 or the cancer house distribution is a very 
improbable one. 

If we deal with the experimental data that were obtained for probabilities on 
the same basis as the cancer statistics we have: 

ae er ee 

x = 313 * 29 72 

= 029 + 310 + °500 = ‘839. 

Or, for n’ =3, the probability is over ‘60. 

Thus although I made a bad algebraic slip the new values confirm practically 
the old and Dr Webb’s cancer data suggest that there may very possibly exist 
a relationship between cancer and environment of some kind. : 


Since my papers in Biometrika were written, my attention has been drawn to 
a Special Report on Cancer in Ireland which was issued in 1903 as a supplement 
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to the 38th Annual Report of the Registrar-General for Ireland. On p. 34 we 
find that in the ten years 1876-1885 inclusive there were 12 multiple cancer 
houses in the City of Dublin. In all these instances in only one house did the two 
deaths occur in m mbers of the same family. There are no cases—or at least no 
cases recorded—of more than two deaths in the same house. 


Unfortunately the paper does not give either (i) the number of inhabited 
houses in the City of Dublin during the ten years under consideration, or (ii) the 
total number of deaths in the City of Dublin from cancer. What useful purpose 
the then Registrar-General could conceive such data would serve completely puzzles 
me*, He draws up a summary of his Tables, introducing it with the words: 
“T venture to draw attention to some of the main facts which they disclose,” and 
clause (7) runs: 


“That in some instances more than one case of cancer has occurred amongst different 
families living in the same house, or amongst successive occupants of the same house.” 


Now unless this is meant to be interpreted as a suggestion that the multiple 
houses are in excess, it must be anticipated from the mere random occurrence of 
cancer. Anyhow without further information the data on this point, as on many 
others in this Special Report on Cancer in Ireland, are wholly worthless and the 
publication does no credit to a Government Department. 


I have striven to obtain the requisite additional data from the present 
Registrar-General for Ireland. He most kindly informs me that the inhabited 
houses of the City of Dublin numbered 23,896 in 1871 and 24,211 in 1881; and 
for the Registration Area of Dublin, 34,118 in 1871 and 36,232 in 1881. All 
these data are from the Censuses of those years. The deaths from cancer in the 
Registration Area for 1876 to 1885 inclusive were 1714, but how many of these 
occurred in the City of Dublin he is not able to tell me. I presume the late 
Registrar-General must have known these deaths in order to detect multiple 
houses, but apparently they cannot now be ascertained. It is clear therefore that 
the data of the “Special Report on Cancer” must remain practically worthless. 


If we suppose the City of Dublin to have had a uumber of cancer cases 
proportionate to its houses we might take: 


} (23,896 + 24,211) 1714 


Cancer Cases in City = 7 (34,118 + 36,232) 


= 1172, 


roughly 3 of the cases in the Registration Area. 


Since the middle of our period is not very far from the Census year 1881, we 
might take 
n= 1172, m = 24,211. 


* Cancer in the Report is associated with alcohol, syphilis, smoking, etc., in an equally unscientific 
manner. Unless we know the incidence of each character or disease in the population, how is it possible 
to determine whether the association is merely due to chance or not ? 
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Whence we deduce: 


Calculated Observed 
Dp, = 11166 pi = 1148 
p= 27-0 f= 12 
p= 0-4 f= 0 


x’ = 8:7 and P = ‘013, i.e. odds of about 76 to 1. 


Here our deviations are not as before towards multiple houses, but cancer 
apparently avoids a house where it has paid one wisit! It is difficult to believe 
that the frequency should be so far below a random distribution in the case of 
multiple houses, and the matter would be still worse, if there were more cases 
relative to the houses in the City than in the Registration Area. I think we may 
safely conclude that the data for multiple houses provided by the Registrar-General 
for Ireland are probably more than 100 per cent. in error, and therefore are wholly 
worthless for the purposes for which they are apparently stated. We cannot 
suppose one visit of cancer to confer immunity on a house, and the fact that the 
multiple houses are so significantly short of the chance frequency suffices, I think, 
to discredit the data and the accuracy of the methods adopted in this Special 
Report on Cancer. 


Biometrika rx 














STUDY OF THE VARIATIONS IN THE FEMALE PELVIS, 
BASED ON OBSERVATIONS MADE ON 217 SPECIMENS 
OF THE AMERICAN INDIAN SQUAW*. 


By ARTHUR BREWSTER EMMONS, A.B., M.D. (Harvard), Boston, 
Mass., U.S.A. 


THE object of this investigation was to determine, so far as possible, the 
variation in form of the “normal” human female pelvis. By “normal” it.was 
intended to exclude all pathological pelves, and to include all variations, not the 
results of disease. Our conception of “normal,” for a standard of comparison, 
should, I believe, include not only the average measurements, but also the 
minimum and maximum as well as the proportion of cases at regular intervals 
between these extremes. 


The bones of the American Indian of the earlier times, collected from various 
parts of North and South America and the adjacent islands, are said by the 
authorities on the Indian (1) to be entirely free from rhachitis, and that other 
diseases affecting the bones, as tuberculosis, osteomalacia, and syphilis, are rare. 
This statement was borne out in my series of specimens, for no evidence of these 
diseases was encountered. In an occasionally elderly specimen, however, the 
remains of an old osteo-arthritic process were found. This late change could 
have had little or no effect on the form of the pelves. 


Varying conditions, exclusive of disease, such as differences of nutrition, and 
certain habits as sitting up for long periods in early infancy, carrying heavy 
burdens in youth, are the common causes and most probable influences modifying 
the shape of the pelvis. Among the many Indian tribes from which the specimens 
of this series came, these factors may have been present at times in varying 
degrees, yet it seems fair to consider them all as from a pure unmixed race, and 
thus the series should yield a true type. 


A due proportion of variations can scarcely be obtained with certainty unless 
at least 200 specimens are used. A still larger series than mine might yield 


* Awarded the Boylston Medical Prize Essay, 1912. ‘The Boylston Medical Committee do not 
consider themselves as approving the doctrines contained in any of the dissertations to which premiums 
may be adjudged.” 
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slight or occasional variations from the standard here set. But it is felt that 
this series of 217 specimens will form a group sufficiently comprehensive to 
include all important variations in approximately their due proportions, and serve 
for a standard of comparison with the male pelvis and the pelves of animals and 
other races of man, as well as a standard to differentiate the pathological. 


Material. The material on which this paper is based is to be found largely 
in the splendid collection of the U.S. National Museum in Washington. The 
remainder is in the Peabody Museum, Cambridge, Mass., and the American 
Natural History Museum, New York City. To the authorities of these insti- 
tutions the writer wishes to express his thanks for the privilege of using the 
specimens, which has made this study possible. He also desires to extend his 
grateful acknowledgments to Dr Hrdlicka of the U.S. National Museum for his 
kindness and advice. 

To obtain accuracy and to avoid error several special means were employed. 
A few of the specimens were somewhat injured. The exact relative position of 
these bones could not always be accurately fixed. Ina few cases careful estimates 
only could be made, but no specimens were used unless the relative positions 
of the bones could be obtained with a fair degree of accuracy. When broken 
specimens were used the measurement was marked (“a”) approximate. The use 
of the accurate “compas glissiére” for the greater part of the measurements 
also tended to reduce error. Nearly all measurements were made on disarticulated 
pelves. To hold the bones in proper relative positions the hand and the sand- 
box were found rather unsatisfactory. The apparatus seen in Plate V was 
therefore devised and used for the entire series. It is believed that the margin 
of unavoidable error in these measurements is not a great one. 


Measurements. Anthropologists differ widely in what they have considered the 
essential measurements of the pelvis. These differences probably arose because 
the pelvis is such a complicated architectural bony structure. 

Topinard (2) gave measurements on 207 pelves of animals and man. 
Verneau (3) made a large number of measurements on specimens from many 
different races. Turner (4), reducing the number of measurements made by 
Verneau, gave observations on specimens of many peoples collected on the 
Challenger expedition. From these I have selected only those few measure- 
ments which seemed to me to be most essential for comparison, and I have added 
a very few observations of interest in regard to one particular region, the pelvic 
outlet. 


In the last step in the evolution of man either from the primate to man or 
from one class of primate to another, whichever anthropological classification of 
man is used, the most essential change which concerned the pelvis was the 
assumption of the erect posture. The pelvis has carried on the function of 
child-bearing probably ever since a pelvis has existed. The function of weight- 
bearing, however, by the recent assumption of the upright attitude was transferred 
5-2 
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from four legs to the hind two. The pelvis, now become a very important girder 
in the bridge structure, thus came to bear the weight not only of the fore part 
of the body but also of the head, and this head has since increased considerably 
in weight. With this large increase in the weight to be carried by the pelvis 
has come a marked change in the direction in which that weight is applied. 
These three factors, the change in direction, the extra weight, and the shortness 
in time since this change has taken place, would lead us to expect changes of 
form of considerable importance and with them variations, the frequent accom- 
paniment of recent change, in those parts to which this function applies. 


The typical “male” pelvis, free from the function of child-bearing, is built 
strong, high, close-knit, thick-boned, with a small cavity, the type best adapted 
to bearing weight. The typical “female” pelvis is, on the other hand, of lighter 
build, lower, more open, and with a roomy cavity, better adapted to child-bearing. 
The newly-acquired function of weight-bearing, however, tends to mould the 
female pelvis toward the male type, while the child-bearing function resists such 
changes in so far as they tend to interfere with its long-established and most 
vital function of child-bearing. To see how nature arbitrates between these 
opposing forces, in other words how with this change the function of child-bearing 
is preserved, was one of the reasons for selecting certain measurements. I have 
also included those diameters and indices with which to compare various groups 
and races. 


Modern pelvimetry, as employed in obstetrics, regards the “obstetric con- 
jugate,”* the shortest distance between promontory and pubic symphysis, as the 
chief factor of pelvic efficiency. Secondary in importance to this diameter and corre- 
lated with it, is the breadth of the pelvic “inlet,” as shown by the greatest trans- 
verse diameter. A comparison of these two dimensions indicates roughly the shape 
of this space. Lastly, it has been recently emphasized that the pelvic “outlet” 
is a not infrequent cause of difficulty in labour and, if contracted, its efficiency 
may be gauged by the inter-tuberal diameter correlated with the space posterior 
to it. The more exact estimate of these two spaces, so essential to child-birth, 
and the means of calculating them in the living, are the direct objects of certain 
of the measurements taken. 


Inlet. In life direct mensuration of the “ obstetric conjugate” (Diagram I, A—R, 
and Plate 11) or the conjugata vera cannot be satisfactorily performed. Calcula- 
tions in life from the “ external conjugate,” the distance from the tip of the spinous 
process of the last lumbar vertebra behind to the top of the pubic symphysis in 
front, as introduced by Baudeloque (5), though often suggestive are admittedly 
unreliable. We have, however, a much more reliable measure for calculation of 
this vera, which is the oblique or diagonal conjugate (Diagram I, A~-P), extending 
from the promontory of the sacrum to the lower border of the symphysis, as 

* The ‘‘conjugata vera” of the text-books is taken from the promontory of the sacrum to the top of 


the pubic symphysis and is a variable amount longer than the “ obstetric conjugate,” consequently I 
prefer this shorter measure though both terms are used synonymously. 
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measured in life without difficulty in small pelves by means of two fingers through 
the vagina. Tocalculate the vera after obtaining the diagonal conjugate the inner 
surface of the pubic symphysis ig palpated aud two points observed ; first, the height 
of the point nearest the promontory above the symphysis (Diagram I, R—P); 
second, the angle which the line on which these two points lie makes with the 
diagonal conjugate (Diagram I, R—P—-A). From these data we may calculate 
in each case the amount to be subtracted from the diagonal conjugate to give 
us the obstetric conjugate or vera (Diagram I, AP—-NP=AR). This difference 
(NP) was calculated in the dried specimens, and found to vary from 0°8 cm, to 
3°2 cms. in my series, 

The following table gives the data used in the living for estimating the 
“obstetric diameter” or the conjugata vera. The pubic height was taken from 
the point on the pubic symphysis nearest the promontory, as an upper limit, to 
the point on the pubic symphysis nearest the tip of the sacrum, as the lower limit, 
which space may be called the length of pubic resistance (see Plate VI, 6 and 
Diagram I, R—P). 


Oblique Diameter o Obstetric Diameter Difference Pubic Height 5 
= _ 





” ” 


9°4 to 9°9 cms. 4 75 to 8°9cms. |; 14 | 0°8 to 1:4 cms. | 46 | 1°1 to 1°9 cms. 9 
10 tol109 , |17| 9 to 99 , |60/15to19 , | 98|2 to29 , |109 
| ll toll9 ,, 74/10 to10°9 ,, | 75 | 2 to24 , | 58/3 to43 , | 95 
112 to129 , |73/11 toll9 ,, | 54|25t032 , | 14 broken | 4 
113 to189 ., $8] 18 to129 ,, | 10 — | — —- | — 
}14 to149 , |10|13 tol4 4, | 4 te ee ia a 
eens ll ADR i ese 
| Average 11°78cms. | 10°68 cms. 1-76 cms. 2°8 cms 

Maximum 14°97, 14 ms os lS Ss a 
| Minimum 97°4 7°5 | 0-8 ae 








The antero- seieliasbel ‘ini being an iened, we turn to the measure 
of the width of the inlet, the transverse diameter. This distance cannot be 
satisfactorily measured in the living. Estimates of this diameter made in the 
usual way from the maximum inter-cristal and inter-spinous diameters are suggestive 
but unreliable, as demonstrated by Scheffer (6), who reported a difference of 
3°3 cms. in the inter-cristal measures of two pelves with equal transverse diameters. 
My series contains two rather narrow pelves (462 and ¢ ae), the COMMpaTinOm of 
which bears on this point; one a generally contracted or “pygmy” pelvis, the 
other simply narrow. Three pelves, moreover, with nearly equally broad inlets, 
show wide variations in their inter-cristal diameters :— 


| | 


£62 crests 20° 8cms.| spines 17:2 | transverse 10°7 | obstetric diameter 8°5 
$200 ,, . ea we 19° | - 10°3 a *. li 
390 , 255 , » 23°25 | » 14 ” ” 9°8 | 
#112 ” 29°1 ” | ” 25 | ” 14°1 ” ” 12 | 
#143 ,, 25° ,, | te age a 14 ” ” 9°2 


uae | 
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It may be seen from this table that #200 with inter-cristal diameter over 
2 cms. broader than $62, yet has a narrower transverse diameter. Again #112, 
with a transverse diameter practically: identical with #30 and $143, has an inter- 
cristal diameter 3°6 cms. greater. By these extreme differences we are forced to 
conclude that accurate estimates of the transverse diameter of the inlet cannot 
be deduced from inter-cristal and inter-spinous diameters. 


Since we cannot measure satisfactorily the transverse diameter in the living, 
we are therefore forced to adopt for obstetric use some such rough rule as 
Williams (7) gives as follows :—“ Despite many inaccuracies, the external measure- 
ments are of considerable value, in that they serve to indicate with tolerable 
certainty the variety of pelves (contracted) with which one has to deal. Normally 
the distance between the spines is 2°5 to 3 cms. less than between the crests: 
but in rhachitic pelves, owing to the flaring of the iliac bones, this proportion 
becomes deranged, and the two measurements approximate one another in length, 
the former frequently being equal to, and occasionally exceeding, the latter. If, 
however, both measurements are considerably below the normal, but preserve their 
usual relation to one another, and at the same time the external conjugate is 
also shortened proportionately, it is permissible to conclude that the entire 
pelvis measures below normal in all its diameters, or, in other words, is generally 
contracted.” The following summarising table of these measures may make the 
results of this series more graphic :— 


Inter-cristal s Inter-spinous | © | Transverse of Inlet) 2 
o - = 
| 20°8 to21‘9 cms. | 2 17°2 to 18°9 cms. | 2 | 10°3 to 10°9 cms. 2 
| 22 to23°4 ,, 10 19 t0 209 ,, as | - todts , 14 
| 23:5to249 > | 49 21 to 224 > 167/12 toiz9 > | 71 
125 to26'4 ,, 88 22°5 to 23°9_,, 67 | 13 tol139 ,, 108 | 
| 26°5 to 27°9_,, 59 24 to 25°4 ,, 44'14 tol47 ,, 22 
| 28 to 29°1 16 25°5 to 27 + 9 — — 


” 


| Average 25°76 cms. 22°66 cms, | difference 3:1 cms. 12°95 cms 
| Maximum 291 , 27 3, a ee 147, | 
| Minimum 20°8_,, en ra 1 - co 





While the average difference between the inter-cristal and the inter-spinous was 
31 cms., the variation between the extremes (1 to 6°75 cms.) of this series free 
from disease is wide and would tend to indicate that little reliance could be 
placed on deductions from such a comparison. 

obstetric conjugate (vera) x 100 


Superior strait Index = ——— oe 
transverse diameter 


average 79°5, maximum 107°7, minimum 61°5. 


These data were considered sufficient to give as accurate a measure of the inlet 
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as possible. For simplification the oblique diameters of the inlet, which cannot 
be measured on the living, were not included. 


Outlet. The pelvic outlet is commonly of very secondary importance as 
compared to the inlet. The space is a complicated one of irregular outline, 
and has received much less attention for these reasons. Occasional extreme 
contractions of this space have been recorded, usually, however, only after 
disastrous results of labour have occurred. The subject has lately been thoroughly 
reviewed historically from an obstetrical standpoint by Williams(8), who gives 
also his clinical observations on 1200 women. In analyzing this complex space 
a few special measurements were taken in the hope of rendering more simple 
and more exact the observations necessary for its estimation (see Plates IIT, VI, 7, 
VIL, 8, 9, 10, and Diagrams I, II, and III). 





Diacram I (traced from Plate VI, 7, pelvis $106). 


Diameters of the Inlet. Planes of the Outlet. 


A. Promontory. 


R. Nearest point of pubic symphysis to promontory. 
A—R. Inlet, obstetric diameter, or vera. 
P. Nearest point of pubic symphysis to sacrum. 7 


R—P. “ Height of Pubic Resistance.” 
A—P. Diagonal conjugate. 
N—R. Perpendicular from diagonal conjugate to obstetric conjugate. 
N--P. Amount subiracted from diagonal conjugate to obtain obstetric conjugate. 
8. Tip of sacrum. 
T. Tuberosity, point of impingement. 
P. Antero-posterior diameter of the outlet. 
T. Perpendicular from antero-posterior diameter of outlet to inter-tuberal diameter. 
—T. Posterior sagittal diameter. 
O. Pubic symphysis to foot of perpendicular. 
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Pubic symphysis 





/ ~ 










Maximum f #4 > ( ar on 
. i y inter-ti = 9°79 ems. .){ inter-tu 
iner-tubors #[ SF Agrees iner-tabors = 979 ak NN Soa 


=, = ~~— > == 
; Minimum inter-tubers = 8 cms. \ 


Maximum 
Ant-post = | Kon. 
14°66 cms. 


Diacram II. Outlet. 


Circles represent the average fetal head circumference, 9-5 cms. 
Average inter-tuberal diameter ——-——. 
Minimum 


” ” 


Maximum 8 aa indicated. 


This diagram shows the effect of the narrowing of the inter-tuberal diameter. 


A contracted outlet, as is clearly pictured in Williams’ article, means a 
narrowing of the space between the tuberosities, accompanied usually by the 
so-called “male arch,” combined with a short antero-posterior diameter. The 
obstruction caused by this narrowing may be greatly increased by a forward 
position of the tip of the sacrum, or may be decreased by a backward position 
of that prominence (see Diagram III). The determination of the inter-tuberal 
diameter is, therefore, of prime importance, that of the “posterior-sagittal” 
(Diagram I, S-—7’), the distance from the tip of the sacrum to the inter-tuberal 
line, is secondary to it and correlated with it. 


One of the first difficulties met with in estimating the outlet was the 
determination of the points on the tuberosities from which to measure. Owing 
to the soft parts this difficulty is even greater in the living. While in some 
pelves the angle of the tuberosities was such that the points could easily be 
selected, in others the rounded and complex curves made this selection a mere 
guess. To overcome this difficulty and to obtain consistent results, a circular 
sheet of transparent celluloid was marked with concentric circles 0°5 centimetre 
apart (Plate VII, 8). This sheet was applied to the inner edge of the tip of the 
sacrum, as one point, and to the two tuberosities as the other essential points 
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of resistance (cf. Diagrams II and III). The circle which just touched these 
three points was read off as representing the size of the fetal head which might 
pass that particular outlet, and the points of contact were marked as the “tuber- 
osities.” Often the circle was tangent for a short distance to the tuberosities. 
The middle of this line of tangency was taken as the point of maximum resistance. 


Pubic symphysis 







a en 


vee Inter-tubers =8 coms. ~s 





= 6 cms. 


Minimum _post-sag 


/ 
y) 

r 
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/ 


y 








» 
hy 


‘ 
Sacral : 
movement 
1-5 cms. back 


Dracram III, Outlet. 


Circles represent the average fetal head circumference, 9°5 cms. 

Inter-tuberal diameter of 8 cms. requires a posterior-sagittal, 7°3 ems. or more —————. 
Normal sacral movements lengthen the posterior-sagittal, 1:5 ems. - - - - - - : 

Reduction of the posterior-sagittal to minimum, 6 cms. ------ ; 


The antero-posterior diameter of the outlet (Diagram I, P—S) was measured 
from the nearest point of the inner surface of the pubic symphysis to the tip of the 
sacrum. The normal rotation of the sacrum on its axis, usually the second sacral 
segment, may during parturition lengthen this diameter, as well as the posterior 
sagittal, a variable amount, from 1°5 to 2 cms.(9). Such lengthening would allow 
the head to pass the tuberosities more posteriorly ; this in turn would bring the 
points of resistance on the tuberosities a variable distance back on those curved bony 
prominences, and, depending on this curve, the points of impingement of the 
head would fall a variable distance further out. Consequently we see that the 
moving backward of the tip of the sacrum for this circular passenger enlarges 
the available space not merely directly in proportion to the distance backward, 
but more nearly by the square of that distance. The figures of the measurements 
taken do not include such increase in the outlet space from the mobility of the 
sacrum (Diagrams I and II). 
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The following table gives the figures of the outlet in convenient condensed 
form :— 


Arch (pubic) Antero-posterior S Inter-tuberal | & | Posterior-sagittal 
_ 


| 
| 
| 


Roomy aes 9- 9:9 cms. | ‘9 cms. 6— 6°9 cms. 
Medium-roomy 10-10°9 48 | 9° : 7- 79 
Mediuin <a 2} RE-TES | 89 f 8- 89 
Medium-narrow ! 12-12°9 ,, 58 “¢ 9- 9°9 
Narrow its 13-14°66 ,, 19 2-12°7é 3 | 10-11°7 


| 
| 
| 








Average 11°59 cms. 9°79 cms, 7°56 cms. 
Maximum 14°66 ,, 2765 is 
Minimum 9 ‘i me 6 e 


Small outlets, specimens in which the “fetal head” scale measured 9°5 to 
9 cms. in diameter, numbered twenty. As may be seen in the diagrams, the 
backwaré movement of the tip of the sacrum to the normal amount would 
enlarge iiese outlets sufficiently to allow the easy passage of the average fetal 
head*, and can thus be considered efficient pelvic outlets. 


The ischial spines, commonly the point of narrowest diameter passed by the 
fetal head in its descent through a normal pelvis, may well be an important factor 
in the mechanism of labour. In my series the attempt was at first made to 
measure this diameter, but in so many specimens it was found that this prominence 
was broken off in part or entirely, that anything like accuracy was impossible. It 
was decided, therefore, that any figures so obtained would be deceiving rather 
than helpful, and none are given. 


An index of the pelvic outlet was estimated as follows :— 


taille Fetes as inter-tuberal diameter x 100 
antero-posterior of outlet 


average 84:26, maximum 1089, minimum 64. 


The two diameters used in this index, however, lie in different planes, and can, 
therefore, hardly be considered a true measure of this irregular exit space 
(cf. Plate VI, 7 and Diagram I, S—P and S—T). The distance between these 
two diverging planes was estimated at the point opposite the tuberosities by erecting 
a perpendicular (O—7’) with the antero-posterior diameter (P—S), as a base line, 
to the inter-tuberosities line (7’). The distance from the pubes along the base line 
to this perpendicular was also measured to show how far back the tuberosities 
were placed (Diagram I, P—O). The last of these distances (P—0O) shows approxi- 
mately how far posterior to the pubes the broadest part of the fetal head may 
be forced to pass, when the tuberosities are sufficiently narrowed. The first 


* Biparietal diameter 9} cms., sub-occipito-bregmatic 94 cms., Farabeuf and Vernier. 
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measure (O—T’) represents the amount to which the birth canal is prolonged 
downward and backward, when the tuberosities are thus narrowed. 


In recapitulating our findings in respect to the outlet it is fair to say, first 
that this is a complicated space, the points of obstruction being on different and 
diverging planes, varying in divergency, thus multiplying the number of elements 
in the problem. Some of these factors I have tried to elucidate, but others, 
including the ischial spines and the variations in size, shape, and malleability of 
the fetal head, must be left for further studies on the living, and for natute’s final 
test of labour. 


By means of the “fetal head” scale the points of impingement on the tuber- 
osities have been determined with comparative accuracy on this series of dry bones. 
These points on the tuberosities were found usually to be on the inner lip of the 
tuberosity at a varying distance from the symphysis. This variation was con- 
siderable and depended on three factors, the curve of these bones, their distance 
from each other, and their distance from the sacrum. By this means it was 
also found that some pelves which appeared to have small generally contracted 
outlets were in reality quite passable. 

The ischial spines, though not available here, must be considered as probably 
an important factor in the mechanism of labour at or near the outlet. 


Finally, emphasis is laid on the importance of the space between the tuber- 
osities, and, when this is reduced, on the available space behind, measured by the 
posterior-sagittal diameter. This combination is probably the most accurate 
practical measure to show the significance of the variations of the outlet. 

The separate bones of the pelvis were measured. The length of the innominate 
bone was obtained by means of the graduated measuring board and block, thus 
giving the maximum distance from crest to tuberosity. The width of the ilium was 
measured with the “compas glissiére” from the anterior to the posterior superior 
spine. An indew for the larger innominate was figured = — as 

reight 


aus height (highest innominate) x 100 
Pelvic index = —® oo ae te) 
breadth (inter-crests) 


Also a 


. 


The sacrum owing to its position of importance and to its great variability 
is a most interesting bone. Its shape often modifies markedly the pelvic cavity. 
Its height was taken from the middle of the anterior surface of the promontoty 
to the anterior surface of the tip of the sacrum. ‘The maximum breadth was 
taken with the instrument parallel to the anterior surface of the bone. 


. , breadth x 100 
Sacral index = ——__—_——_ 
height 

Four “observations” were added. 1. The number of sacral segments. 2. The 
sacral curvature was estimated as “slight,” “moderate,” or “pronounced.” 3. The 
segment noted at which the curve began. 4. False promontories were noted and 
used for the measure where the cavity was involved in the measurement. 
6—2 
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Summary. The features brought out by this investigation are: 1. The 
variation in the size, shape, and type of the female American Indian pelvis as a 
whole. 2. The variation in the pelvic inlet and 3. outlet. 4, The variation in 
the number of sacral vertebrae. 5. The frequency of a false promontory. 


1. The variation in size of the normal female pelvis is considerable, as is 
illustrated in Plate I, 1. 


The shape of the pelvic cavity also varies much, as is suggested by Plate II, 2, 
comparing a flat inlet with a rounded one; and Plate VII, 9 contrasting a wide 
outlet with a narrow one. 


The “male” type or high, narrow-arched pelvis is shown in Plate III, 3, com- 
pared with the “female” or extremely shallow, broad-angled pelvis. The “male” 
type suggests a birth canal with a small bore, long cylindrical cavity ending in a 
narrow-arched small outlet. The effect on the mechanism of labour is to increase 
its difficulty proportionately. 

The determination of the sex of a pelvis is not always easy, and in a small 
percentage (perhaps 1 to 3°/,) of cases is next to impossible, even with the aid 
of the whole skeleton. The breadth of the great sacro-sciatic notch was found 
to be the most reliable guide. The acuteness of the sub-pubic angle was next in 
usefulness. The thickness of the bones and finally the skull and long bones 
were referred to in doubtful cases(10). No specimen was included unless deter- 
mined by these signs to be female. 


2. Considerable variation is seen in the diameters of the inlet. Michaelis (11) 
reported 1,000 cases carefully studied as to pelvimetry and the results of labour. 
Litzmann (12) continued the work, reporting a second 1,000 cases. The standard 
determined by these men has been accepted throughout the world. Litzmann (12) 
considers all pelves contracted if the conjugata vera is 10 ems. or less in a generally 
contracted pelvis, and 9°5 cms. or less in a flat pelvis. By this standard my series 
of Indian pelves show 63 “contracted pelves,” or 29 °/,, as follows :— 


Generally contracted 9 to 10 cms. 
eee »  8to 89 


| 
| 


6 | Flat9 to95 cms. | 43 
il. & eae. 12 
—j| » @5to79 ,, 2 | 


” 


| 


r 
| 
| 
| 





A pelvis was classed “generally contracted” if the transverse diameter was 
less than 12 cms., at least one centimetre below the average: with, at the same 
time, an obstetric diameter of 10 cms. or less. A pelvis was considered “ flat” 
if the transverse diameter measured 12 cms. or more with the obstetric diameter 
9°5 ems. or less. 


Three only of the seven generally contracted pelves showed the inter-cristal 
and inter-spinous diameters reduced more than 1°5 cms. below the average, and 
thus might be said to “suggest” lateral contraction within normal limits. The 
other four showed no such suggestion of contraction. 
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The flat pelves, 56 in all, show that at least slight contraction of the inlet 
is to be expected in one pelvis in every four, while moderate to serious narrowing 
of this all-important space is to be expected about once in twenty pelves. 


3. The outlet showed wide variation in size and shape. By the use of the 
“fetal head” measure it was determined that no pelvis, in all probability, was 
too small to pass the average fetal head. It was found that a narrowing of the 
inter-tuberosities diameter was the most important single factor in reducing the 
size of the available space. Next in importance was the shortening of the space 
behind this line, or in other words a reduction of the posterior-sagittal diameter, 
found with but slightly greater frequency in pelves in which the sacrum contained 
an increased number of segments. The combination of these two factors, short 
inter-tuberosities and posterior-sagittal diameters, was essential to reduce seriously 
the efficiency of the outlet. The normal movement of the sacrum, allowing the 
tip to swing backward, enlarged the available space considerably. The diameter 
of the ischial spines, though probably of great importance in the efficiency of 
a pelvis, was not available in this series of ancient pelves. 


4. A numerical variation of the sacral vertebrae was noted in 47 pelves 
or 21°7 °/,, or one in every five pelves. The number of segments ranged from 
four to six (see Plate IV). By an increased number of segments is meant sacra 
in which the six segments were all sacral in character, or those in which there 
was a transitional vertebra, whether lumbar or coccygeal in character. Such a 
classification, as was pointed out to me by the late Professor T. Dwight, is more 
practically useful than anatomically correct. Pelves with small outlets were 
found slightly more often among those with an increased number, and it is 
possible that this increase in segments may be one small factor in reducing the 
size of the outlet. Aside from the possibility of slightly infringing on the outlet 
space, in rare cases the numerical variation appears to play no important part in 
the variation in size and shape of the pelvic cavity (Plate VII, 9 and 10). 

5. False and double promontories were found in twenty pelves, about one 
in every eleven pelves. These false promontories varied from a marked prominence 
of the second sacral vertebra (Plate VI, 6), an equal prominence of the first and 
second sacral segments, to a projection of the top of the last lumbar vertebra 
beyond the sacrum nearer to the pubes (Plate V, 5). In nearly all cases false 
promontories were associated with transitional vertebrae or an increased number 
of sacral segments. The apex of the lumbo-sacral bend falls at a point pro- 
portionally distant, in all probability, from the sacro-iliac attachment or the 
“vertebra fulcralis” of Welcker (13), and in these transitional cases this distance 
brings the point on another vertebra than the usual one. False promontories 
occurred in four specimens classed as moderately generally contracted and in one 
with a small outlet, but in no other “contracted” pelvis. That is a little more 
often than the general average, but in all probability has no special significance. 
From the general appearances as well as from the measurements it seems fair 
to say that the false promontory has no appreciable effect on the pelvic cavity. 
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Congenital dislocation of the hip was found in one specimen (#10). The 
displacement, of the right femoral head only, was upward and backward where 
a new socket was formed on the ilium. The deformity produced a tilt to the 
pelvis of approximately 10 degrees downward to the right. A new acetabulum 
was formed, and the femoral head and neck showed practically no change. As 
will be seen by the measurements no alteration is found in the pelvic cavity in 
spite of this deformity. 


In conclusion I would emphasize that a “normal” standard for the pelvis 
should include not only an average, but also a series cf measures graded from 
minimum to maximum. A large series would seem necessary in order to include 
the wide variations found among non-diseased pelves. Owing to these great 
differences found among the specimens of such a comprehensive series, less stress, 
I think, should in future be put by the anthropologist on those small differences 
seen in a few or even in a small series of pelves of separate tribes or races, but 
rather the close resemblances should be emphasized in contrast to the great 
differences noted between animals and man. And the changes might be further 
studied and traced which have followed the assumption of the erect attitude. 


Such changes as the differences in the sexes are receiving the attention of 
the embryologist (14). The anatomist is studying the variations of the spine 
and their significance. ‘The obstetrician is constantly seeking more light on 
the interpretation of peculiarities’ of this complex bony structure. It was my 
desire, therefore, to suggest to the anthropologist an application which may be 
made of his work of collecting and classifying, identifying and interpreting such 
valuable material as is here used, and also to direct his attention to the applied 
anatomy and physiology of the pelvis: that is the function of weight-bearing 
and child-bearing. 


The popular idea has been that the function of child-bearing among the 
American Indians was always efficient and easy. Engleman (15) confirms this 
idea by saying that labour as a rule among North American Indians is short 
and easy, averaging two hours. As civilization is approached, however, labour 
becomes more extended. Thus half-breeds, as the modern Mexican Indian, 
average three to four hours. He further states that accidents during labour 
are rare when women do not marry out of their tribe, for the child’s head is 
in proportion to the pelvis. But deviation from the natural state, he continues, 
brings difficulty. The example is given of the Umpqua tribe, who have inter- 
married with whites and have died, it is stated, during labour in consequence 
of the disproportion between the larger head and the ordinary pelvis. When 
the father of the child, however, was also an Umpqua no such trouble is known 
to have occurred. 


My series of dried specimens bears out in general this clinical evidence, for 
most labours might well have been short and easy so far as the pelvis was 
concerned. About one-fourth, however, would require a rather smaller fetal 
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8. Celluloid circular scale for Outlet, dark circle diameter 9 cms., string 
marks tuberosities. Pelvis $15 California, 13,232, six sacral vertebrae. 





9. Variation of Outlet. String marks intertuberal diameter. Wide: $13, California 13,554, Intertuberal 12-75, 
Antero-posterior 14°66 cms. Narrow: $34, Massachusetts, 47,998, Intertuberal 8, Antero-posterior 12 cms. 





10. Six sacral vertebrae, all sacral in character. Roomy Outlet. Tuberosities marked. 
10‘lems, Antero-posterior 1:75 cms, California, 13,232 (same as photo, 8). 
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head than the average, as found in civilized countries and peoples, in order for 
labour to be short and easy. About one in ten would require a small and 
perhaps malleable head to make successful labour possible. Observations on 
the heads of new-born pure-blooded Indian infants as to size and malleability 
would be a welcome supplement to our present knowledge of pelvic efficiency. 


(1) 
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| No. sa A Tribe or 9.0 Inter- Z of a) a <i | 3 + | Ba os 
No. Locality |spines| © 23 | 8& | 5 | 8 S23 td > | St |e 3 
& |°a/68| A |< al 2 E qn a 
| : 
1 | 11,970 Tennessee 7 (3h ae SA | aS ee 78 17 
2 | 57,506 Ohio ‘5 | 11°25) 1:25] 81 |10°8 | 105 | 9 83 | 4:5 
3 | 26,630 -25/11°75| 15 | 94 |11 | 95 | 81 | 68 [53 
4 | 26,989 : 3 | 95 | 18 | 745/116 |12 |103 | 83 | 45 
5 | 13,551 California g 22 | 11°23 1 86°2/ 10°6 | 11 | 92 77 =| 4 
6 | 13,238 » 5 2 “7 ‘75 | 10°75| 1 91°4| 11°15 | 11 10 8 4°2 
7 | 13,240 “= 27 245 |14 |11°4 | 10-1 | 13 a: iT 11 10°1 77 | 42 
8 | 13,239 = 26°25 | 22°75 13°25! 12°3 | 10°6 is. | & 11 9°5 9 67 | 58 
9 | 13,286 ” 25°5 | 20°7 | 13°25] 12°3 | 11 3 | 88 10°8 | 10 8°85 | 7:4 | 4°5 
10 | 13,448 a 25 22 13 ; 12°1 | 10°4 17 | S0- [109 11 99 | 76 | 45 
11 | 57,883 fe 275 125 (1325/1155 | 95 | @ | 711/13°75/195 |10 | 105 | 45 
| 12 | 13,553 = 25°5 20°5 | 13°75/11°5 | 10°2 13 | 74°2 13 10 9-1 75 | 4°85 
| 3 | 13,554 % 26°25 | 21 14°25 | 11°5 9°5* 2 | 66°6 | 14°66 | 13 | 12°75 | 8:24)3°6 | 
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| 17 | 13,546 _ 27" 24°5* | 13 14°‘7 | 13 1°7 | 100 12°75 | 11 | 10°2 75 | 56 
18 | 13,547 ” 25°5 | 238 | 13 12°3 | 118 1°3 84°6 | 11°25 | 11 | 10°1 | 7°38 | 4:6 
19 | 13,548 * 26°25 | 23 13°56 | 12°2 | 11 | 1°2 81°4 | 12 10 9 | 7°7516 
20 | 48,009 Mexico 24°75 | 21 13°9 | 11 98 | 1°2 70°5 | 12°25 | 10°5 | 95 | 73 | 6-25 
21 | 57,7822 California 27 | 24-25 | 14 122 | 10 22 | 71°4|10°75/12 |11°7 | 72 | 56 
22 | 58,143 Iroquois, N. Y. | 27 25°5 | 13°25|12°4 | 10-2 | 22 | 73°6| 9 10 | 98 | 75 | 4°65 
23 8.3592 Flat Head 27°25 | 25°5 14°25 | 12 10°25 1°75 | 71°9)} 11°1 11 10°3 6°6 5°5 
24 | 57,458 New York 25°25 | 23°5 | 12°5 11°7* 9°5a 2°28 76 12 ll | 9-1 St |] 
25 2,347 Kentucky 25°25 | 22 | 13 12 10°5 15 | 80°8/ 12 11°5 10°9 8-2 | 51 
26 | 2,346 ” 28°5 | 26°25 | 1 13°2 11°5 ry ot SFi tas 10°5 9°5 77 | 65 | 
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28 | 11,972 Tennessee 23°5 | 22 113 13°5 | 12 1° | 92°3]11°75/11°5 | 10-2 8°6 | 4 
29 | 11,861 : 27a |o4e 114) «| 126 | 11 16 | 786) 11 10°75 |10°1 | 7°8 | 4 
30 | 57,512 » 25°5 | 23°25 | 14 115 | 9°8 | Se | 70 12°25 | 12 10°3* 8°8 | 4°6 
8 27,213 s 26 | 24:25 | 13 11°5 | 10°25| 1°25] 78°8| 10°25 | 10 9°7 7 | 4°15 
32 | 32,435 Massachusetts 26 23°25") 13 11°5 | 10 Sg Mig a a 8°8 | 6°58 | 57 
33 | 10,262 « 23 |20 |12 125 | 11 15 | 833/118 9 81 | 6 |6-4| 
34 | 47,998 pes 28°5 | 27 13°5 | 1371 1] 2°1 | 80°4) 12 10 8 | 8 6°71 | 
35 57,779 Arizona 25 23°3 | 13 98 | 7°75) 2°05] 59°6| 11 10°5 | 10 | 6°5 | 5:4 
86 | 58,005 Ohio 25 22°5 | 12 12 11 ] 83°3 | 12 12°5 | 10°8 95 |4 
37 | 58,022 a 27°5 | 25" | 12 14°8 | 14° 08 | 107-7/13 |12 |103 | 8 |5%5 
| 38 | 58,049 Pa 26 ‘ 13 11°7 9 2°7 69°2 | 12°5 11 10°4* 6°5* | 7 
| 39 | 58,056 s 27% ‘258 12°75 | 132 |11°5 | 17 | 96°2) 11°25) 11 99 | 8 |47 
40 | 58,023 = 26°25 | 23 13 14°] 13 11 |100 {12 12 2 7°8* 16 
41 | 58,057 = 23" | 20°75" 12 13-2 |10°25| 2°95] 85°4|10°25| 975) 88 | 76 | 4:5 
42 | 58,453 ‘ 25°75 | 235 | 135 | 11-2 | 10 12 | 74 {12 |115 |102 | 8 |5%5 
43 | 58,463 ra 28 |25 14°5 | 123 | 105 | 1:8 | 724/123 | 12 10°6 | 9 |5 
Pte: a a | 
| U.S. N.M. | | | | 
| 44 227,463 Alaska 23°1 20°56 | 11°2 | 12 | v7 2°3 86°6 | 10°3 9 10°6 6 | 5°4 
45 | 225,473 Apache 25°75 | 23°8 13°9 | 12°7 | 10°2 2°5 75°5 | 12°2 | 11°5 | 10°7 87 |6 
46 | 228,361 ” 25°5 | 22 11° | 11°9 | 102 ta 88°7 | 10°2 95 | 85* | 7:2 | 3-9 
47 | 226,290 Arizona 26°5 1/23 |13°8 |11°8 | 95 2°3 | 69°5/12°2 | 9°75| 84 | 71 | 63 | 
is Siac fale | et ee 
1. One sacral segment probably from coccygeal end. 3. Last lumbar articulates with left 
sacral ala, 5. Four sacral segments. No evidence of more at either end. 6. First sacral tran- 
sitional. False promontory. 7. False promontory. 10. Congenital dislocation of right hip. Two 
sockets. Inlet diagonals, right 12°3, left 12°3. 11. Coccyx ossified to sacrum (not included), distorted. 
12. False promontory. 13. Wide outlet (Plate VII, 9). 


(Plate VII, 8 and 10). 


17. 


Coccyx ossified to sacrum (not included). 


(Note: “a” means an approximate measurement often due to a broken bone. 
value or some peculiarity referred to in text.) 


15. Well proportioned symmetrical sacrum 


20. Tapering sacrum. 


Italics signify high or low 
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I 
\ American Indian Squaws. 
aoe a Se eS Ae eS a 
cars OvTLer INNOMINATE Bone Sacrum 
ae Z ere i as = 
3 | 4| 3 | oe z | | 
Bj os| = i fa ar) ats Spal eo" a om ee ae “oe ae | 
3 go(a| 82 | 2 |38 33/3 "\3e/3¢/5/2) 38 | = [28] 2& j8a] 8 | 
Bs OR] ¢ Sq 3 os | OS as | 3 5 = os ‘S Ay = pa) 3 
é Hale gee ae tampa ee ae a feel Slee) 4 
o a es ie |" 02 a | 
av et. pele taal Ree ee [ra Se ee , ae a es, SS bares Late 
— ea 
| 3:3 | 3 Medium} 80 | 19:8 14 70°8 | 19°8 | 14 — | 776/10 11°5 6 | Moderate Ist 115 
5 | 4. +3 M | 83:3 | 20°6 | 15°2 — | 21°1|15°6 | 73°9 | 77-4} 11°1 | 12°4 5 M 3rd 112°6 
“3 | 3°3 | 2°9 Narrow | 73°6| 19°2 13°8 — | 19°4/13°7 | 70°6 | 74:6} 11°6 a | 5 Slight Ist 95°7 
5 3°5 =| 3°4 Roomy | 88-6 | 19°9 | 15:4 | 77°4 19°9 | 15°4 — |75:1| 9°99 |11°4 5 | M 1 115°1 
| [3-7 | 23 M 86:8 | 18°5 | 14 — | 185] 14:3 | 77°3/ 787} 6-3 | 11 4 \Pronounced 1 1746 
2 126 | 43 KR 89°6 | 18°3 | 14°5 — | 186) 14°4 | 77°4| 78:3/10°5 | 10°5 5 M 2 | 100 
2 3 2°4 R 91°8 | 19 14°75 | — | 19°2 | 14°65 | 77°6 | 71:1 } 10°95 | 12°6 5 | Ss 4 115°1 
8 3 3°5 N 81s | 20°4| 15°52 | 76 =| 20°44) 15°38 | — | 73-9] 1071 | 1295! 5 S | 3 | 191-2] 
5 13 3°4 N 81:8} 18°8/ 14" | -— | 18°8| 15 79°8|73°7| 97 [121 | 5 s | 1 | 194-7] 
5 36 |265| R 90°8 | 19°7 | 13°5 | 68-5 / 19-4) 138 | — | 74-4/ 93 111-9 | 5 S | 1 | 1979 
5 45 |1°7 M 75 | 19°9/ 16 80:4} 19°99 15°6 | — | 72-4] 86 | 10°95| 5 Po} 2-137 3} 
“85 2°75 | 2°28 M 82°5 | 19 14°4 | 75°8 | 19 13°75 | — | 74°1 | 10°25 | 11°1 5 Ss | 4 | 108°3 
ws 3 ? R 87 |18 | 14:58} — | 186! 145° | 72°6/70-9| 9:1 | 122 | 5 M | 2 | 134-1| 
9 | 3°75 |3 M 79°6| 19:1 | 14:88! — |19°8/14°5 | 73-2) 825] 98 |119 | 5 M | 3 /121-4| 
“15 3°25 | 2°8 M-R 85°9 | 19°1 | 14°6 | 76°5 | 19°1 | 14°4 73°6 10°6 | 12 6 | M ee 113°2 | 
| 3 3°25 R 90 /19 | 158] 763/19 |14:5 | — | 76-8| 9°65/11°75} 5 | M | 1 | 192-7 
6 3 2°75 M 80 20 152 = 20°5 | 15°2® | 74-1 | 75-9 | 11°3 | 12°2 5 M 1 107°9 | 
6 3 2°5 M 89°7 | 18°6 | 14°4 | 77°4 | 18°64) 14-48 | — | 72-9] 8:1 | 12°3 5 M |; 1 | 181°] 
. 3°5 | 2°7 M 75 18°4 14°6 - 18°7 | 14°4 | 77 71°2 | 8-4 12°3 5 M i 146°4 | 
95 3°25 | 2°35 M 77°5 | 19°2| 14°8 | 77-1 | 19°2 | 14°6 776| 9h | 11-1 5 P | 1 |121-9 
6 4 2°7 g 108°8 | 19°8 | 144 | 72°7 | 19°7 | 14°5 — |73°3| 96. | 126 5 M 6} ft }aare 
“65 3°25 | 38 M 108°9 | 19°5 | 14°28 | 72°8 | 19°5* 14-2 — | 72°2|106 {| 12°5 5 S 2 1179 | 
“5 3 3 M 91°1 | 19 15 — *| 19*2:| 15 78°1| 70°} 89 | 12:1 5 M 1 135°9 | 
| 3°5 | 3°68 M 75°8 | 18°6 | 13°7 — | 19-1} 141 | 73°8| 75°6| 10°5 | 10°85} 5 M 1 | 103°3 | 
+] 26. | 2 R 90°8 | 17°8 | 14 78°7 | 17°38 | 14 — |70°| 99 | 10°8 5 Ss 3 109-1 
o 3 | 31 M 73°1 | 20°8/}15°5 | — | 212/155 | 721/744) 116 | 125 | 5 Ss 2 | 107°7 | 
6 3°75 | 2°9 M 75°2 | 19°6 | 15°3 | 78°1 | 19°6 | 15°2 — |71°3| 84 11°9 5 M 1 141°6 
26 | 2°5 M 86°8 | 19°1 | 14 — | 19°7| 13°7 | 69°5| 83°8| 9°99 | 11-2 5 M 3 113°1 
2-5 |3°3 R 91°8 | 21°1 | 15°1 - | 21°4 | 15°14 | 70°35 | 79°3 | 11°25 | 12°7 5 M 2 112°9 
6 3 2°8 t 84°1 | 19°4| 15°2 | 78°8 | 19°4 | 15-2 - | 76°71 | 129 | 11-1 6 Ss 5 86 
“15 25 | 3°5 M-R 94°5 | 18°4| 14°5 | 73°3 | 18°4 | 14°5 - | 70°8 | 13° | 11°4 6 Ss 3 86°3 
7 | 3°56 1 39 N-M 78°6 | 19°2 | 13°88 | 71°3 | 19 13°8 — | 73°8 | 108 11°7 5 M 1 117 
4a | 2-5 | 3a N 73°6 | 18 12 — |18°2/13°8 | 75°8| 79:1 | 10°6 | 10°3 5 S 1 97°1 
<a 4°35 | 3°4 N 66°6 | 20°5 | 16 78 20°5 | 15°8 -— 71°6 | 10°2 12°2 5 M 3 119°6 
4 377 | 2°5 M 90°9 | 18°7 | 13°8 — | 188! 14°1 | 75 75°2| 9% 11°4 a) M a 120 
4 2°6 R 90 18°6 | 15 - |18°8 |) 14°5 | 77-2 | 75:2 | 11°1 10°9 5 Ss l 98°2 
5 3°75 | 32 M 79:2 22:2| 165 _ | 22-5| 16°5* | 73°3| 81°83] 111 | 126 | 5 P 1 | 113° 
3 3°52 | 3:1 M 84:9 19 13°58 - | 19°2| 13°5 | 70°3| 73°8|10°1 | 11°2 | 5/6 S 3 | 110°9 
“7 3°5 | 2°8 M 88 20°2 | 15°2 — | 20°7 | 15°2" | 73°4| 76°7 | 10°2 | 12°2 ) M 2 | 119°6 
3 2°6 t 100 | 21 155 — | 213) 15°6 | 73-2 | 8171} 1171 | 12 5 M 3 | 108°1 
+5 3°5 | 2°9 N 85°8 | 18 13* 72°2 | 18" | 138 78°3)| 83 | 41-7 5 M 4 133 
5 3°75 | 2°7 M 85 19°8 | 15°4 | 77°8 | 19°8 | 15°2 76°9 | 12°1 11°3 6 M 4 93°3 
3°5 | 3:3 R 86°2 | 21°3 | 16 75°1 | 21°2| 16 — | 76:1} 11 13°7 5 M ,3 | 1245 
“4 3 2°6 N 79°6 | 19°5 | 14°5 + | 73°8 | 19° | 14°5 — | 84°4| 10°35 | 10°25 5 Ss 2 99 
4 2°7 M 87°7 | 21°0 | 15°6 —— | 21°2 | 15°75 | '74°3 | 82°3| 7°7 12°2 5 P l 158°4 
9 3°5 | 3°5 N 83°3 | 20°1 | 15 74°6 20 15:1 — | 78°8; 95 10°8 5 M 3 113°7 
3 . 3°7 | 3°0 N 68°9 | 19°4| 13°2 | 68 19°1 | 13°7 — 73°2 10°9 | 11°45 5 Ss 4 105 
21. Ossification, left ilio-sacral synchondrosis. 26. Large pelvis, heavy bones. 30. Six sac. vert., 
one from coccyx(?). False promontory. 31. Six sac. vert., one from lumbar. False promontory. 
33. Small pelvis, coccyx ossified to sacrum. 34. Sacral angle acute (3rd vertebra) (Plate VII, 9). 
35. Narrow flat inlet. 37. Large pelvis. 38. Left sacral ala rises to 3°1 cms. above body and 
articulates with last lumbar. 42. Six sac. vert., one from lumbar(?). False promontory. .*. Small 


pelvis. ‘“ Male type.” 46. ‘Male type” (Plate III, 3). 47, ‘Male type.” 


Biometrika 1x 








50 


A Study 
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Tribe or 





68. Six sac. vert., one from coccyx (1). 
canal open from Ist sacral down. 


No. |U.S.N. M. I ; 
Ko, socality 
| 
48 | 213,331 Arizona 
49 | 229,363 i 
50 239,20! ” 
51 | 239,202 | " 
52 | 239,203 | i: 
53 | 239,204 é 
54 | 239,215 | fa 
55 | 239,291 | ‘a 
56 | 239,293 | ‘ 
57 239,298 | ” 
58 | 239,305 | ie 
59 239,309 - 
60 | 239,318 | ” 
61 | 239,333 bs 
62 | 239,348 B 
63 | 226,292 uw 
64 | 239,385 i 
65 | 239,446 “ 
66 | 239,453 | * 
67 | 239,474 is 
68 | 255,129 Arkansas 
69 258,768 = 
70 259,301 ” 
71 | 225,253 Choctaw 
72 49,735 Colorado 
73 | 225.214 Comanche 
eh aa --, | Eskimo (St! 
‘4 248,579 | Michaels) { 
Peabody M. 
75 12,804 Tennessee 
U.S.N. M. | 
76 | 227,434 Illinois 
77 | 227,440 | ts 
78 | 227.441 — 
79 | 227,445 | 8 
80 | 227,448 | a 
81 | 227,450 | ¥ 
82 | 225,420 | Kentucky 
83 | 225,422 | é 
84 | 225,421 | rs 
85 | 225,423 | a 
86 | 225,425 ‘. 
87 | 255,105 Louisiana 
88 | 255,214 > 
89 | 255,216 rs 
90 | 216,213 Mexico 
91 | 228,925 New Mexico 
92 | 228,950 a 
93 | 228,967 | ” 
51. 


(Plate IV, 4) sacrum. 
57. Double promontory (Plate IV, 4). 
Extreme “ female type” (Plate IV, 4) 
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Table of Pelvic Measurements of 











Transitional vertebra. 
59. Narrow arch. 


| 


INLET 

FA he of | $ | 

— +?) | = 

a |23| 27/8) 4 
= [8 | 55 | =) | 
12°8 | 11 9°3 | 1°7 | 72° 
14°7 | 122/102 |2 | 69° 
14°3 | 12°2|10°6 | 1°6| 74: 
13-3 | LLL | 8-9 | 2-2) 67 
113 12°1| 102 | 19 | 78°5 
112-4 |10°4| 93 | 1°1| 75 
13 12°2/}10°25|2 | 785 
12°25 | 12°3 | 10°85 | 1°5 | 88°5 
13°2 | 12°4|10°7 |1°7} 81 
13°1 | 12°5| 10 2°5 | 703 
13°4 | 10°8| 99 | 0°9| 73°9 
12°3 | 13-2 | 11°5* | 1-7 | 93°7 
12°3 | 10°1| 85 | 1°6| 69°71 
14°2 | 1271/1071 |2 | 71:1 
10-7 |10°%5| 85 |2 | 79:4 
sb 9-2 | 1°8| 70-7 
12°3 | 12°3/ 91 | 32) 74 
13°6 | 1174] 9 2°4 | 66°2 
13°7 | 11°3/ 10 13 | 73 
13 99 | ga 1°9 | 61°35 
13°5 | 13°8/11°5 | 2°3 | 85-2 
12°1 | 12°8/ 11 1:8 91 
13°6 | 12°8} 11 1°8 | 80°9 
|126 | 13 | 10°7 | 2°3 | 846 | 
a's. | 3n-3 | Sa 2°5 | 65°4 
13. | 111] 83 | 2°} 63°8 
13°6 | 14°3/ 11-8 | 2°5 | 86°8 

| 

12°38 |13°5 115 |2 fees 
12-2 |10 86 | 1:4] 70°5 
12°5 | 116} 9:4 | 2°2| 75-2 
11°5 | 11 9°7 | 13/843 
113°3 | 14 | 128 2 | 9072 
12°1 | 13°6|11°8 | 1°8/| 97-5 | 
12°8 | 13°3 | 11°75 | 16 | 91°8 
12°8 | 116} 10°71 | 15 | 78-9) 
13°4 {11:1} 9°8 |1°3/ 73:1 
12°6 ) 11-7 10-4 | 1°3 | 82°8 
12°6 | 11°9| 10-2 | 1°7| 81°3 
12°8 | 135/12 1°5 | 93°3 
13°3 | 11 9°2 | 1°8 | 69°2 
13-1 | 13°1|11°3 | 1°8 | 86-2 
13°8 | 12°4/10°6 | 1°8| 76°8 
115 | 132/111 | 21 | 9672 
13°5 | 12°2|10°8 | 1°4| 80 
12°4 | 11-9] 10 1-9 | 80°6 
13 12°8| 10°4 | 2°4| 80 
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53. Six sac. vert., one 


73. Rather flat pelvis (Plate II, 2). 








Exostoses on sacrum. 
62. Pygmy or very small pelvis (Plate I, 1). 
71. Coccyx ankylosed to sacrum (not included). 


60. 


PELVIC 
a 
GPE. 
2 [$2 | 32 
= |e 33 
S$ |ga |e 
as | pa |g 
| } 
9°5 6°75 | 5°3 
12°78 | 8° | 5°7 
10°9 82 16 
10°5 66 | 65 
10°2 82 | 4°8 
9°2 8 6°2 
10°7 8 | 5°38 
92 | 75 | 474 
10°77 | 81 |5 
10°9 8°3 3°7 
10°3 8°7 4°5 
Ss 75 | 5°7 
9°7* 8°e 2°9 
112 | 76 | 54 
8°3 6°8 | 4°6 
9°8 75 4°8 
9°6 66 (5 
10°4 6°2 5°8 
10°3 7°77 | 5% 
8°5 67 5°2 
10°6 9°1 3°7 
9°8 8 | 5°5 
11°4 76 |3°9 
9 6°5 | 66 
11°4 7 4-4 
8°5 66 | 3°7 
8°75 | 8°25 | 6-5 
10°2 7°6* | 4°6 
oe | 7 5°3 
9°35 | 7 59 
8°85] 81 4 
10°3 8°8 | 5:2 
91 | 7 |61 
8°85 | 8 51 
mot & | 6°5 
9 8* 69 
10°85 | 8°7 | 4°5 
8°6 6°9 6°4 
10°8 81 5°8 
9°8* | 6°78 | 6:3 
10 72 | 53 
10 871 | 6°1 
10°2 13% 184 
10°35 | 7°5 | 4°6 
10 73 16 
7°5 |-5° 


from coccyx (?). 

Four sac. vert. 
67. Small pelvis. 
72. Spinal 
74. Six sac. vert., one from 
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American Indian Squaws (continued). 
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7—2 





| | 
OUTLET | InnominateE Bone Sackum 
a! 
—] 
Sia). A as aa Bake ice cE eet Ee SS ee Ss ae Se mt 
s3/3/ 2 S lel dol py lect dl ul ale | 8 ght a3| . 
rie BS | 3 |Se| 32/38 |88/38'3/35)2 1 3 188) & lealé 
Bele | a> |g Bela |e ela | eS a) a lege) Oo ee 
> a x | ° 
| 
3 2°5 N-M 89°6 | 18°4 | 14:2 - 18°6 | 13°9 | 72°2| 71°2| 9-7 | 10°9 5 M 1 112-4 | 
3°5 | 3°4 R 90°8 | 20°4* 15°5 — | 20°4)15°5 | 75°9| 72°3/ 10°38 | 11°5* 5 Ss 4 | 1065 
3°75 | 2°7 R 86°5 | 19°3 | 15°5 ~ |19°4]15°5 | 79°9|70°5| 96 |11°6 | 5 M 2 | 120°8 
35 | 35 M 85°3 | 18°7 | 14°71 | — | 19°1| 14°1 | 73°8| 73°4| 11-4 | 10°8 9/4 Ss 2 94°7 
| 4°25 | 3°2 R 89°5 | 20°2 | 15°4 | 76°4| 20-2 | 15°2 — |72°1| 9 11°5 5 M 1 127°8 
3°25 | 2°4 R-M 73 | 18°7 | 13°9 | 79°7 | 18°7 | 13°8 -- | 72°6|10°9 | 10°9 6 Ss 3 100 
3°25 | 2°7 R 98°1 | 18°5 | 13°4 — |185|13°7 |74 |74 |11°77 | 109 5 M 1 | 92°3 
| 3°25; 2°6 | R 87°6 | 18°83} 13°9 | — | 19°1|13°6 | 71:2] 75°6| 9°9 | 10°6 5) M 3 107 
3°5 | 32 M 89-2 | 18°5 | 13°8 | 74:6 | 18°5 | 13°8 — |70°4| 99 | 113 5 Ss 2 | 114°1 
|3°5 | 2°6 R 100 19°6 | 14°2 — | 19°8 | 13°7 | 69°2| 77°6| 10°9 | 11°3 5 M 2 | 103°7 
4 {2 R 92°4/19" |14 | — |19 14 73°7 | 73°38] 8 11-1 5 N 1 | 138°8 
3 | 2°7 N 64 19°4 | 14°3 | 73°7 | 19°49 14-38 | — | 7173} 10°9 | 11°5 5 M 1 11871 
3°5 | 2°6 i 94°6 | 17°2 | 13 _ 17°3 12°9 | 74°75 | 75°2| 7°25j{ 10-9" } M 1 150°3 
3°) | 3 v 94°9 ; 19°8 | 15°3 19°9 | 15°4 | 77°4| 69°83! 10°5 | 12°4 5 M 2 118-1 
2°38 | 2°7 M 80°9 | 16 11°3 | 70°6 | 15°7 | 11°15| — | 76°9 | 79 9°2 5 Ds) 3 116°5 
3°5 | 2°5 M 85°1 | 18°3 | 13°5* | 73°8 | 18°3 | 13°5* - | 70-4) 9°4 11 5 | Ss 3 117 
2°75 | 3°2 M 95°6 | 18°38 | 14:8 | 78-7 | 18°7 | 14:7 - 42:5:)- 97 | 107 S | M 2 110°3 
4 |31 M 99 | 19-4 | 13-9 19°5 | 13-9 | 70°8|75°7| 9-4 [121 | 5 | M 1 | 128-7 
4 |28 N-M 87°7 | 18°7 | 14 — | 18°7| 14-1 | 75-4 | 68 85 6 11°6 5 | M 2 | 136° 
3°25 | 3°6* N 77°3 | 18°1 | 13°6" 18°1 | 13°6 | 75°1 | 69°6 | 8-2 10°8 ) M 1 131°7 
3 | 3°3 R 82°2 | 20°8 | 15°25 - |21 |15°5 | 73°8 | 76-4 | 10°8 12°7 6 M 2 117°6 
3°75 | 3°5 M-R 85°2 | 19°2 | 14°6 — | 19°3 | 14°6 | 75°6 | 72°8 | 9:2 11°45 5 P 1 124°5 
3 3°2 M 103°6 | 20°7 | 14°48 | — | 20°7 | 14°4 69°5 | 82°3 | 10°3) | 11°7 5) M 1 113°6 
3°25 | 3°5 M 75°6 | 18°2 | 15°3 | 84-1 | 18°1 | 15°2 — | 67°7|10°9 | 11° 5 s 1 105°5 
—Vitad ies a t 103°6 | 19 41 | — | 19°1 | 14:1 73°8 73°6 10 11-2 5 M 1 112 
3°25 | 32 | M 90°4 | 19°1 | 14°4 | 75°8 | 19 14°1 —- 72°8 |} 11°8 | 113 3) M 2 95°8 
35 138 | N 65°3 | 21°9| 15-7 | 71°3/91°7/15'1 | — 789/109 | 11-4 | 6 M 2 | 1046 
| | 
| | | 
a7 |22 | R 89°5|18°8 14:1 | — |19 |13°8 |72°6)76 | 10-9") 115 | ¢ M | 2 | 105% 
| on | 
3°75 |2°85| 2 83°3/17°5 129 | — | 17-7) 1275/72 80-4) 7°75) 103 | 5 P 1 | 132-9 
3°5 | 3°6 | N-M 79°2 | 19°2 | 14 — | 19°5| 148 79°6 | 75°35 | 82 | 11°3 6 P 3 137 38 
3 2°9 N-M 80°4 | 18°48 13°9 — | 18°4| 13°98 | 70°1 | 84 78 | 10°5* 5 M 2 134°6 
3°5 ? | M 85°8 | 20°5 | 15°2 | 74°1 | 20°4 | ia°2 - | 84°2/ 11 115 | 5+? M 3 104°5 
3 3°4 N 79°1 | 18°6 | 14°5 — | 18°6| 14°58 | 77-9 | 74°4| 8°5 | 10°35 | 5 ¥ 3 121°8 
35 | 3°6 | N 76°9 | 20°8 | 15°4 | 74 | 20°6 | 15°4 - 87°1; 10°2 | 11°4 5) M 1 111 “1 
3° |3 | N 72°1 | 19°3 | 18°5 — | 19°4| 13°53 | 69°6 | 72°4| 10°3 | 12°7 5 M 1 123°3 
27 | 31 | R 74°4 | 20* | 15°5 | 77°5| 20 15°3 —- 70°7 | 12°5 | 11°7 5/6 Ss 4 93°7 
25 | 26 | R 90°8 | 18 14°6° || — | 18°6 | 14°6* | 78°5 | 76°8 | 1l0°2® | -11°7 5 M 3 114°7 
3°75 | 3- M 81 | 19°2 | 14°7 | 76°5 | 19°2%| 14°7 — 73 10°1" | 11°2 5 Ss 1 110°9 
35 |35"| R 748/20 |15°9 | 79°5| 20" | 15-98 | — | 755/103 (10-7 | 5 M 1 | 103-9 
3°754) 3°2 M 89°1 | 18°1 | 14°78 | 81°2 | 18°15 14°78 | — | 63-7) 9 12°18 5 M 2 134°4 
3 | 3 M 91°7|19°2| 145 | — 19°3 | 14:5 | 75°1 | 72°8| 10°5 | 12 6 M 2 114°3 
4 | 3°3 M 81°9 | 20 15°15 | 75°7 | 19°9| 15-4 | — | 776/115 | 12 6 M 2 104°4 
325/34 | R 91*1 | 18-9 | 14 | = | 19°2| 13°38 | 71°8| 784/108 105 | 5 M 3 | 972 
3°25) 3°1 | M 94°1 | 19°4 | 13°8% | 71°1 | 19°4%) 13°8 -— | 746|10°9 | 11°6 5 M 3 106°4 
3°4 | 371 M 91°3 | 18°9 | 13°65 | 72°2 | 18°9 | 13°6* - |72°7| 84 | 11°71 5 M 2 132-1 
3 3 M 75°8 | 19°3 | 13°18 | 67-8 | 19°34 13-12 | — | 71:5] 97 | 11°1 5 M 2 114°4 
| | | | — 
lunibar(?) (Plate IV, 4). 75. Six sac. vert., all sacral in character. 77. (Plate V, 5), six sac. vert., 
one from coccyx. 1st lumbar forms false promontory. 78. Small pelvis. 79. Suggestive of 
six sac. vert., one from coccyx. 82. Coccyx ossified to sacrum. 83. 5th lumbar sacralized left, 
free, right. False promontory. 86. Round pelvic inlet (Plate IT, 2). 88. Six sac, vert., one from 
coccyx (2). 89. Six sac. vert., one from coccyx(?). 92. Coccyx ossified to sacrum. 
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U, 8. N. M. 











No. 
94 | 246,968 
95 225,215 
96 | 204,254 
97 225,213 
98 225,250 
99 227,006 
100 | 227.007 
101 | 227,011 
102 227,014 
103 227,024 
104 | 227,432 
105 | 242.513 
| 106 242,568 
10% | 225,217 
108 | 225,218 
109 225,221 
110 225,242 
111 | 225,261 
112 | 225,415 
113 | 225,416 
114 169,672 
115 98,470 
A.M.N.H., 
ie 
116 H, 3658 
117 | ee 
118 | * 
119 a 
120 | ss 
121 | H,102 | 
22 | H, 191 
23 | H, 271 
24.| H, 307 
25 H, 321 
26 99/2393 
27 99/2230 
28 99/2165 
29 99/305 
30 | 99/2506 
3] 99/2426 
32 | 99/2542 
33 99/2330 
$4 | 99/2325 
35 | 99/2539 
36 99/2329 
a7 99/2544 
38 99/2486 


a on oo oe oe 


94. Six 


sacral in chars 


coccyx (7%), 


A Study of the Variations in the Female Pelvis 


Lable of Pelvic Measurements of 





| INLET | Petvic 
‘ime |W nae y eens Ai gear 
ribe or crests nter- 2 2» @ 2 | be a > m i | 
Locality | spines : s 3 33 s “ 19 & = Bi | 3 24 £ 5 
cms, > | ee 32 2 2 353 an E PES gig 
| | # |8a|é3| | 2 |aeg; 2 | & | 22 28 
Cred ee wl all & 
| | | 
New Mexico | 27°5 | 23:8 | 144 | 133 |10 | 25 | 75 |126 |11 | 10 | 8 5°75 
Pah Ute 25°5 | 22°2 | 12°7 | 12° | 10% | 2 82°7 | 13°5 | 14 Ye 
Peru 27° 21°5 13°8 12°2 9°25; 2°9 67 11°3 10°25 9 | 75 | 53 
é 26°5 | 22°25) 12-5 | 10-4 | 8 24 64 | 11-9 | 10 86 | 81 | 55 
‘ 25 (20% |129 | 12 95 | 25 | 736/12 | 11 10°3 | 7:78 6 
te 246 |20 | 12 12°3 | 10°8 1°5 90 | 1171 | 10 96 | 63 | 5°7 
‘as 27 24°5 | 135 14 11°6 2°4 86 11°75 | 12 11°75 | 7:4 | 57 
“s 25°3 | 233 |12°3 | 121 | 105 | 1-6 | g5-4| 19-5" | 118 9-5" | 8-4" 4-18 
” 26 22 | 13°2 12°8 | 11°25 16 85-2 | 11°3 11* 10 | @& 5°5 
a 27°3 | 26°5 | 14:3 | 13°L | 11°35) 1°8 | 79:4] 13°6 | 12°5 | 11°5 8°8 | 6°l 
a 24°5 | 22°3 | 12 11-4 | 9 2°4 75 |11°5 | 9° 8°1 7 6 
- 25 21°56 | 11°5 11°2 91 2°1 79°1 | 11°5® | 11°5* | 11 7°28 | 5:3 
“ 24°55 | 22 (118 | 116 | 94 | 2-2 | 7Q7/ 11-1 | 9 8°3 | 63 | 6:4 
Sioux 27°1 | 23 13° 13°2 | 11°6 16 88°5 | L1l*+ 12 L0°7 85 | 3°9 
99 28°5 | 25 13°9 13°3 | 10°1 3°2 722 |} its | 74 9°7 8 5°] 
2 27°8 | 24°1 12°1 13°3 | 11‘ 2°2 91°7 | 11 11 9°8 76 | 47 
a 28 25°5 12°6 13°2 | 11:2 2 88°8 | L2°4 11°5 9°7 84 | 5°] 
‘a 27 23°5 13 12°3 | 10 2°3 76°9 | 10°8 11 9°5 ta | 4] 
5 29°1 | 25 14°1 14°4 | 12 2°4 85°1 | 13°2 11°25 | 10°6 76 | 6% 
49 28 24°1 12°75 | 11°8' | 10°15 1*7 79°6 | 12°5 | 10°5 9°4 7°75 | 6'4 
Virginia 28* 24°5* | 13°28 | 11°8 | 10°28 16 77°3 | il] 1] 9°9 72 | 4% 
Wisconsin 25°5 | 22°6 13°25 : 10°85 ? 81°9 | 10°8 10°3 9°25 | 7°15 | 5°3 
| | | 
Pueblo, N.M.! 27 23°25 | 13 13°3 | 11°2 2°1 86°2 | 12°7 12°5 11 972 |5 
‘in 24°5 | 21°8 12°5 12°2 | 1l 1°2 88 10°6 10°5 10°2 6°5 | 5°5 
" 25°2 | 23 13°4 12°5 | 10°7 1:8 79°9 | 11°8 | 12°5 111 76 | 53 
% 24°9 | 238 13°1 | 11°4 | 9°5 1°9 725 |12. {10 | 9 774 |5°9 
e 27°3 | 25°4 13°3 12°3 10°4 19 77°4112 | 10°5 8°5 7°5 5°2 
South Utah | 23°8 |19°3 | 12-4 | 11 9 2 72°6 | 11°6 {125 | 11 974 | 4 
~ 22 20 12 11°6 | 10°4 14 86°7|11°3 | 11 | LO‘4 77 | 4°8 
a 26°5 | 23 13°1L | 11°8 | 10°1 1‘7 77°1/11°3 {10 | 96 72 | 58 
” 27 23°4 14 12°] 10°6 1°5 75°7 | 11°6 105 | 9°7 71 6°2 
ss 25°7 | 22°9 | 13°1 10°2 9 12 68°7 | 10°3 | 10°5 9°7 6°8 | 4°7 
Mexico 24°5 | 22 12 116 9°9 1-7 82°5 | 11°2 | 10°5 9°5 PL | Re 
- 25 20°5 12°9 13°3 | 11°8 15 91°5 | 11°4 10 9 7 55 
9 26 23°7 12°5 11°5 | 10 Ld 80 11 10°5 8°9 75 |5 
” 25°6 | 22 13°2 Id Sp 9°6 16 72°7 | 10°3 10 | 9-2 66 | 4°6 
» 27 24°1 12°1 13°1 15 1°6 95 11°8 95 | 8:2 TS 6 
i 26°9 | 24°6 13°1 12°] 10°5 1°6 80°1 | 10°2 9°5 9°2 6°1 5°3 
mA 25°6 | 22 12°8 13°2 | 11°9 1°3 93 11°2 10 9°2 67 | 54 
2 25°1 | 23°5 13 149 | 13 1-9 100 12°6 1] 9°5 8°5 | 5°3 
a 25°1 | 21°4 13 12°8 | 11°3 1°5 86°9 | 11°6 11 10 774 | 62 
5 28 24 13°5 12°8 | 11 1°8 81°5 | 12°7 1] 9°1 8'1 5°8 
‘a 25°1 | 23°1 | 13°1 13°7 | 11°6 2°1 88°5 = 12°5 1] 9°4 8°3 | 4°5 
” 26°1 | 21 12°5 12°9 | 11 1°9 88 10°3 10°5 9°7 8 4°4 
* 23°2 | 22 12°6 11°9 | 10 19 79°4 | 11°5 10 9 7 4°6 
sac. vert. (Plate IV, 4). False promontory. Spinal canal of sacrum open. 95. Shallow 
pelvis, wide outlet (Plate III, 3). 98. Six sac. vert., one from coccyx(?). 99, Six sac. vert., all 
ucter (Plate IV, 4). 100. False promontory (Plate VI, 6). 102. Six sac. vert., one from 
104. Six sac. vert., one from coccyx (?). 105. Last sacral segment missing. 


106. Transitional vertebra. No false promontory. 
107. Large pelvis (Plate I, 1). 108. Large pelvis. 109. 
segments (Plate IV, 4). 


Small pelvic cavity (Plate IV, 4, Plate VI, 7). 
Transitional vertebra, left=6, right =5 
111. Coccyx ossified to sacrum (not included), 116. Wide outlet (bones dark 
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American Indian Squaws (continued). 























OurLer InNoMINATE BoNE | Sacrum 
61s RR eee ote Es 
3 & ~ a | | 4 | | | 
SEl si 3 — ite] de “tat 1 = sa a Ae |g | g2/ , 
ge|a | ee | 2 |28|23| 2 |38/9¢/2)2/21 9 /28| € |fal & 
3 S| 2 ad S | 3B | 35 § |‘s4 | on | A & ‘3 $ = 5 5.8 | = 
33,| -s S es =| Sols i a) ea im 5 & oO o oe = 
me | s RA = | & A A Ag a2 | 
| & i 4 | | | 
sa CAPE GIs | Se PES eee Pier a ere, ee 
3°25 | 3 M 79°4 | 20°8 | 15°35 | 73°8 | 20°7 | 15°25 | — | 72 11°9 |11°9 6 P 2 100 
| 3°75 | 3°1 R 82°2 | 19°8| 148 | — | 199 | 14°39 | 74°9| 776) 75 | 11°6 5 4 1 154°7 
| 4°25 | 3°15 N 796 | 20°7 | 15°2 — | 20°7| 15:3 | 73°9 | 76-2 | 10°7 | 12°4 5 M 3 | 115°9] 
| 3°25 | 2°4 N 72°3 | 184) 14:5 | 734/183 ]14:2 | - 69°4  =8°85 | 10°7 5 M 2 120°9 | 
13 2°2 M 82°4 | 19°8 | 15 — }20 | 15:1 75°5 | 80 10°5* | 11°9 6 Ss 2 113°3 
3 |33 M 86°5 | 18°7| 144 177 | 1841144 | — |76 | 97 1105 | 6 M 1 | 108-2 | 
3°25 | 3°2 M 100 20°1 | 14°5 — | 20°2 | 14°8* | 73°3 | 74°8 | 10°6 | 11°7 5 M 3 | 110°4 | 
| 3°258) 3® M | 76 18°2 | 14 76°9| 182/14 | — | 70-2) 9 11°5 5 M 1 | 127°8 
3°25 | 2°8 M | 88°5 | 18°7 | 14°6 — |18°7|14°7 | 786) 719! 9°49 | 11°5 6 Ss 1 111°7 
3°25 | 3°5 R | 87°4 | 20-2 | 15°6 — |20°4|15°1 | 74 74°7 | 10 11°7 5 M l 117 
3:1 |4 N 76°4 | 18°6 | 13°75 | 73°9 | 18°3 | 13°75 759, 9 11°2 6 M 2 124°4 
13°25 /3 R | 95°7 | 17°4| 13°25] -—— | 17°45, 136 | 78°1| 69°6| 8-74 | 10 5 M 1 114°9 
2°75 | 3 M 74°8\18 | 14:1 78°3 | 18 14°1 73°5 | 9°8 10°5 »/6 M 2 107°7 
3°75 | 3°1 R 93°9 | 20°4 | 16°1 — | 20°56 |16°1 | 78°5| 75°6| 99 | 13°2 5 M 1 133°3 | 
4°75 | 3°4 N 82°3 | 21°3| 16°6 | — | 21°4| 15°8 | 73°8)| 75:1! 96 | 11°3 5 P 1 117°7 | 
3° «| 3°3 N | 89°1 | 20°1 | 15°6 | 77°6| 21-1 | 15°d — | 75°99; 10°38 | 11°5 | 5/6 M 3 106°6 
4 3°2 } 78°2 | 20°2 | 15°5 — 20°5 | 15°3 | 74°6 | 73°2: 1O°S =| :12°1 5 Ss 1 115°2 | 
3°25 | 2°9 M 87°9 | 20 15°1 | 75°5 | 19°9 | 15°1 74°1| 98 |.11°5 5 M 1 117°3 | 
3 |3°7 N 80°3 | 21°5 | 16°9 | 78°6 | 21°5 | 16°8 73°8 | 12 12°15} 5 M 2 | 101-2] 
13°4 | 3°8 N 75°2|20 | 16 80 19°4 | 15°85 774} 8-1 12 ) P 1 148°1 | 
2°75 | 3 M 89°2 | 18°8 | 15 — |19°1 | 15* 78°5 | 68°2| 99 | 11°9 5 M 3 120°2 | 
2°8 | 3°25 N 85°6 | 20 14°7 73°5 | 19°7 | 14°4 — | 78°4| 10% 11°8 5 M 2 112°4 | 
et | | 
| | act | 
4:3 | 3°3 M 86°6 | 2071 | 15°4 _ 20°71 | 15° | 77-1 | 74°4 | 10°71 1174 5 8 3 112°9 
|3°7 | 2°5 M 96°2 | 18°5 | 14°1 — |185)|14°3 | 77°3| 75°5| 9-7 | 11° 5 M 2 117°5 
3°3 | 2 H 94°1 | 18°8 | 15°4 — |18°8|15°6 | 77°7 | 74°6 | 10°4 | 11°5 5 M 3 110°6 
3°8 | 19 M 75 =| 189) 14:2" | — | 19 14°2* | 74°7 | 76°3 | 108 11°9 5 M 1 119 
3 16 M 70°8 | 182/147 | — | 191152 | 796) 70 | 113 | 117 | 5 Ss 2 | 103°5 
i 2°6 R 94°8 | 18°3 | 13°3 | 72°7 | 18°2 | 13°3* — |76°9| 92 | 11 5 M 3 119°6 
4 12 M 92 19°6 | 14°2 19°8 | 14°2 | 71°7 | 90 82 | 10°4 4 M l 126°8 
35 | 2:1 t 84°8 | 19°2| 14°8 | 77°1 | 19:2 | 14°8 72°35 |} 11°3 | 11°3 5 Reverse | 5(?) | 100 
471 |2°2 M 83°6 | 20 14°3 — 20°3:| 14°6 | 71°9 | 75°2 | 10°8 | 12°1 6 M 2 112 
3°35 «| 2°9 M 94°1 | 18°3 | 13°8 | 75:4] 18°1 | 13°8 71°2|11°8 | 10°7 6 PS) 5 90°7 
36 | Ll t 84°8 | 18 13°7 | 761/18 13°7 73°5 | 91 LO°4 9) M 1 114°3 
1 2°7 N 79 19°4 | 14°9 19°7 | 14°7 | 74°6 | 78°8 | 11°1 11°5 ) M 3 103°6 
3°8 | 2°9 M 80°99} 19 | 14°1 | 74:2 | 18-9 | 14°3 wae i ee ae 5 s 3 | 1134 
3 2°5 M 97°1 | 18°9|13°6 | — | 19-1) 138°4 | 70°1 | 74°9| 10°1 | 10°8 5 M 1 | 106-9 
4 2°4 WV 69°5 | 19°6 | 14°4 | 73°5 | 19°4 | 14°3 72°6 10 12°] 9) M 1 121 
3° | 2°1 N 90°2 | 19 15 78°9 | 19 14°8 706; 9:1 11°3 3) 4 1 124°2 
2°7 t M 82°1 | 19°3| 14°3 | 74°71) 19 14°3 - | 40°4 | 11°32 11°7 5/6 mS 104°5 
3°2 | 2°5 M 75°4 | 20°6 | 15 | 72°6 | 20°4 | 15 82°1 | 10°1 12°2 ) M 2 120°8 
4:7 | 2°8 M 86°2 | 18°8 | 14°7 | 78°2 | i8-8 | 14°5 | 74:9, 96 | 11°5 5 M 1 | 119°8 
3°7 | 2°5 M 86°6 | 19°9 | 15° — |20°4/15°3 | 75 72°9| 106 | 12°5 5 SS) 4 117°9 
3°8 | 2°5 M 75°2|20 | 14 — | 20°1| 14°1 | 70°71 | 80°1| 10-2 | 12°5 5 M 2 | 122°5 
3 2°2 M 94°1 | 19°9| 14°5 | 72°9/19°9|14°5 | - 76°2 | 10 12°1 a) P 1 | 121 
25 | 3°3 M 78°2 | 18°9 | 14°1 | 79°9 | 18°8 | 14°3 — | 81°4| 10-4 | 10°7* 5 M l 102°9 
| | | | 
colour, identification). 117. Bones light colour, identification. 118. Sacral canal open 
Ist and 2nd sac. vert. behind. 119. Exostoses, Ist sacral vert. left. 120, 3rd, 4th and 5th saeral 
segments broken on right. 122. Four sac. vert., sacrum complete. 123. False promontory. 
Slight reverse curve of sacrum, 124, Six sac. vert., all sacral, double promontory. 125. Six sac. 
vert., all sacral, false promontory. 127. Double promontory, “ Male type” of pelvis. 130. Small 


outlet. 132. ‘Transitional vertebra, right sacralized, left free, false promontory. 
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Table of Pelvic Measurements of 



























































| INLET | Pexvic 
: Inter- | mae StS nea 
A, M. N. H. Tribe or ;|Inter-| 2 | | i & 
No | N. Y., No. Locality “— | spines ~ 28 | 4 z z bee g 3 2 53 28 
ems. | s |fe|22|8| 2/838) 2 | 2 | 83/28 
| | 2 |S2/S8/8/ 5 |888| 3) 8 | 82/28 
| | & |SA/SS/4 424) 8 | g |e? a5 
| | er Se | _ 
| | f | | 
189 | H,15075 | Apache (Arizona) | 27 | 214 | 133 | 14°3 | 11°5 | 2°8 | 86°5 | 12°8 | 125/105] 8-9 | 5-2 
140 | H, 15098 | Pueblo |96°5 | 23°35" | 14 11:3} 9°5 | 1°8 | 67°8| 12 11°5 | 10 8° | 55 
141 | H, 15124 Arizona 27 23°7 | 18°1 | 11°1 | 9-4} 1°7 | 717 | 10°8 | 11. | 10-2] 7 5 
142 | H, 15094 | a 24 |21 | 12°1] 12:4] 10°5|1°9| 86°8| 11°99 111 | 99] 7:9 | 46 
143 | H, 15154 | 4 125°5 | 22 14 |10°8; 9:2} 1°6| 65-7 | 12® | 118 | 10:2] 73 | 5:3 
144 | H, 15124 | a | 25 22°2 |13 |106] 9:2|1°4/70°7| 11°4 |10 | 10 66 | 63 
| 145 | H,16035 | S.E. Utah | 24 19 11°6 | 11°8 | 10-9 | 0-9 | 93-9 | 10 9 8:4) 66 | 45 
146 | 99/106 | Eskimo 24°5 |21°8 [121] 9:4] 7:5) 1°9| 61-2| 11°6 | 11 8°8| 8 4°] 
147 | 99/3743 | Brit. Columbia | 28 24°19 | 14 | 13°6 | 12°3| 1°3 | 87°9| 13°4 | 11-5} 11-2] 66 | 7:5 
148 | 99/3737 | * 25°4 | 23 | 13° | 13°5 | 11-8 | 1-7 | 87-4) 11-4 | 11 | 10 5] 75 | 5 
149 | 99/1614 | ed lotte | 94 | 205 | 11-7| 12°3| 10°8| 1:3 | 92°3 | 10°6 | 10° | 10 | 7 15 
150 | 99/3756 | Brit. Columbia | 27°6 | 24-4 | 13 | 12-4| 105 | 1°9/| 80°8 | 13 11°5| 10°1L| 85 |6 
151 99/1720 | Fort Rupert, B.C. | 23°7 | 19°5 | 13 |11°3| 9 2°3 | 86°9 | 12 12 10°3| 8-2 | 52 
152 | 99/1731 | = 25°38 | 24 13°8 | 11°8| 10°5 | 1°3 | 76-1 | 10°8 | 10°5| 10:3] 6-3 |6 
153 | 99/1727 a 24 22 13°1|} 13-4) 11 | 2:4] 84 | 12°8 | 12 9-9| 9 5 
154 | 99/1741 | a | 24°8 | 22 13-4 | 11°8 | 10°1 | 1°7 | 75°4| 12°5 | 11°5} 101] 7°56 | 55 | 
155 | 99/1670 | Nimpkish, B.C. | 25% (23 | 13) | 143) 12-9) 14) 992) 109 11 | 101) 7 47 | 
156 | 99/1674 | = 124°5 | 22°5 | 13°4| 131/12 | 1°1| 86-9} 13 115| 103] 8 oI) 
157 | 99/1676 | i 26" | 24-2814 | 124/11 | 14/786] 14-2 | 13 | 11-2) 95 | 6-1 | 
U.S.N. M. | | Te | | | 
158 | 225,407 | Sioux 125°8 | 22 11°8 | 12-4] 10°8 | 1°6 | 91°5 | 11°5"} 9-59] 8-54 6:5" | 5°3 | 
159 225,412 ES | 26 22 13°3!11°7} 9° |2°2|71°4] 10°4 |} 10 | 9:2) 6°75|)46 | 
160 | 225,414 | is |26°5 | 24 13°5 | 13°1 | 11°8 | 1°3 | 87°4| 11°3 | 10°5| 9:3] 7:9 | 4:8 | 
161 225,408 | Dakota 26°8 | 22 13°1 | 13°3]11 | 2°3| 84 10°7 | 11 10°1| 7°8 | 4°4 | 
162 | 225,409 | 2 1253 | 21-5 | 13-1 | 12°7| 10°5| 2-2} 801 11-7 | 12°5| 11-4| 88 | 3-75 
163 | 225,406 | Cheyenne 125°5 | 21 13 | 129) 11°3| 16 | 86°9| 121 | 11 10°3| 74 | 61 
164 | 261,810 | Arkansas | 27 25 14:°2;10°8| 9 | 1°8| 641/118 | 9 8'4| 65 | 5°8 
165 | 261,824 | ss }28°2 | 23-5" | 14-5 | 127/11 |1-7|75°9| 13-2 |11 | 105] 75 |7 
166 | *1,C Argentina =| 26 22 12°8 | 135/12 | 15/938) 127 | 105| 97| 7 | 66 
167 | 262,577 Arkansas | 26°75 | 23 135/12 |10 |2 | 74:1| 11°8 | 11 97| 76 | 4:7 
168 | 262,570 = 127 | 28°7 | 125 1127/11 |17/88 | 113/105] 9 | 75 | 45 
169 | 264,488 Peru }26°1 | 235 | 13 |135)115|2 | 88-4) 106 | 105/102] 68 | 48 
170 | 264,489 | = | 26°6 | 23:9 | 125| 12-3) 103) 2 | 82-4) 123 115) 10°8| 73 | 66 
171} 8S Arizona | 25 23 11°9| 10°9| 9-1] 1°8| 76:4 | 10°8 | 9 8'°8| 63 | 56 
172| *70 ss }232 | 191 |12°8|103| 9 | 13/703 11 |10 | 8 3! 75 | 43 
173 | “83 = | 25 23°5 | 11°38) 1114) 9 | 21) 762] 11°6 | 105) 9:4) 7°3 | 53 
174 *161 SS } 24-4 | 191 | 13 | 10-4] 8-3] 2°1/| 63°83) 10 11 | 10 | 6 4°7 
175 2 | se }27°l | 245 | 12°8] 11-2] 9:1] 271) 7171] 12°4 | 105; 8 85 | 52 
176 *3 99 124°5 | 22 12% | 113} 95/18) 76 | 108") 11 9°4| 7°5* | 46 
177 *104 x 26 | 235 | 128) 11-4) 92/22/719)| 114 | 95) 86! 7:3" | 5-2 
ivs| *112 | ss 25°5 | 22-3 | 129/117) 99 1°8| 76°7/ 11°2 |105| 9 | 75 |5 
179 *82 | ‘3 25°3 | 23°5 | 135] 118} 9-7] 271] 71-9) 118 | 105) 97) 72 | 62 | 
180 7 is 25 22°5 |12°9/115| 9 | 25) 69-7) 10:1 | 105) 97) 7 oe 
181 | _)} ; 23°8* | 20°9 | 12 |11°4] 9-1] 2°3| 75-8} 10°3 | 11 99] 7°7 | 5°38 
182 | #5 | 25 23 12°71} 11:1} 9 |2°1| 74:4) 10°8 | 11 9°7| 83 | 4°4 
183 | *Cave 3 | a 27°8 | 24 13-°9| 11°5| 9-2 | 2°3 | 67 10°5 |10°%) 10 | 7:2 | 4:8 
184 | *152 | i 25°9 | 22 13°5 | 12°5 | 10° | 2 | 778| 12 | 1 | 108] 77 | 4°7 
185 | 58 mt 24°5 [22-2 | 13 | 101) 8-4 | 1°7 | 646 | 10-4 | 105 | a4 75 | 4:1 
| Merge ts oe Soe De eee ee 
139. Transitional vertebra, left sacral, right free. Large pelvis. 145. Small pelvis. 146. Flat 
pelvis. 147. Large pelvis. Last lumbar articulates with right wing sacr. 148. Six sac. vert., 1st 
tends toward lumbar type. False promontory. 156. Six sac. vert., last tends toward coccyx. 
157. Transitional vertebra, right free, left sacralized. 158. Six sac. vert., one from lumbar. False 


* Original number. Specimen not yet permanently catalogued. 
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American Indian Squaws (continued). 





































































































| 
OvuTLET | InnominaTE Bone Sacrum 
$ 
“7 aie 4 ] ort * Saas aires BSE or ew eS 
es 2 S A | ma 7 a 4 as soag as | 
oo | & = we) 3 ae aa ih rv 2 = S os © | 
2 | = 2 fee SO ee ee ee eee 
= wn © | | | 
a } 
} | | | 
3 | M s2 |20-4|16 | — 20°6 | 15°3 | 74:2| 76°3| 11°6 | 12-6 | 4/5 M 3 | 108°6 | 
27) M 83°3 | 19°5 | 14°8 | 75°9/ 19°5 | 14°88) — | 73-6] 82/119! 5 M 1 | 145°1 
3 R 94°4 | 19°2 | 15-2 | 79-1 | 19°24" 14°5 | — | 71-1) 11 11°8 5 M 3 107°3 
2°1 R 83°2|18 | 13°3 -— |18°2| 13°2 | 72°5 | 75 | 10°2 | 11°2 5 Ss 4 109°8 
3 | 2 85 18°4 | 13°8* | 75 18°3 | 13°38 | — | 72°5| 10°5* 11°4 5) Ps) 1 108°6 
2°6 M 87°7 | 18°2| 13°38 | — | 18°3 13°7 | 74°8 | 73°2 | 10-2 11:1 5 ) 1 108°8 
a M | 84 17°7 | 13°4 — |17°9 | 13°6 | 76 | 74:6] 82 | 10°6 ) M ¥ 129°3 
3 M | 75°9/178/ 13:5 | — |18 | 133 |739|73-5| 93/116] 5 S 3 | 124-7 | 
3:2 M_ | 836/205 | 16:2 20°7 | 162") 77-8 | 73°9| 101 | 123] 5 M 3 | 121-7 | 
}2 | R_ | 921/20 | 148 (204/146 | 70 | 80-3) 12°6| 12-7] 6 M 2 | 100°8 | 
21 | R | 943/18 | 135 | 18 6| 13-2 | 70-9| 7 5/17 | 11-2] 6 M 3 | 95°7 
| 25 | M | 77°7| 21-9] 166 | 75:8 | 21°6| 16-1 | 79°3 | 10 | 11-7| 5/6 M 2 |117 
1°8 | R | 85°8/ 186] 145 | — | 18-7] 14:8 | - 78°9 9°4 | 11-4 | 5 M 1 121°3 
25] R | 95°4/ 20:5] 15:4 | 75-1| 20°3| 15-4 | — 79°4 | 9-2) 12°2)| 5 P 1 | 132°6 
24) R 77°3|20 | 141 | 70 | 17 | 15 — | 83:3) 11-1 | 119] 5 M 3 | 107°2 
371 R 80°1 | 19°6 | 14°6 | 74:5 | 19°5 | 14°7 - |79 | 85/123] 6 P 2 | 144°7 
2S R | 92°7| 19°8| 14-7 SE Se 75°4|78 | 96)10°3] 5 Ss 4 |107°3 
29 R | 792/20 | 14°6*| 70:3] 19°8| 14°6 | — | 81°6| 7°6/11°9] 5 M 2 | 151-1 
| 27] RB | 78°9| 20-64 15-5" | 20°6 | 15°5 | 75-2 | 79-2 | 10-4] 123] 5 | M 1 | 1183 
} | | | 
| | | | | | iy Se ee Soe 
| 26] N 739/19 | 14-7 | | 192} 14° | 75:5 | 74-4 | 128 | 11s} 6 | M 2 | 983 
|} 29] M 88-4196} 15 | 765/196] 15 | 75-4) 101/113) 5 | M 1 | 111°8 
| 33] M | 823) 205] 16-2 | }20°6} 16 | 77-7|77:7| 109/107} 5 | M 3 | 98°2| 
| 27] -R | 944/203] 16-4 | — | 20-4} 15-7 |77 |761|102]/12 | 5 | S | 1 |1176| 
3°4 2 97°4/ 19°14] 14:8 | — 19°7 | 15°1 | 75°6|77°8|10°6] 114] 5 | M | 2 | 1076] 
33] M 851/20 | 15:6 |73 | 199] 15-4 | — | 78-4) 91} 112] 5 P| 1 | 1934 
2°9 N 71°2 | 20-2 | 14°3 | 70°8 | 19°8 | 14°38) — | 74:8) 121/121) 6 | My. {.4°.) 
3°4 M 79°5 | 21-4] 15-5" | | 21°6 | 15°5 | 71°8 | 76°6 11°7 | 12°4 me M 3 106 
| 2°8 N 76°4 | 20 15°3 | 120°3| 15:1 |78°3/ 783/119] 1181 6 | MS peed 
|28| R 822/19 | 15-1 | 795/19 | 149 71 |116/125| 6 M | 1 | 107°8 
2°9 | N 79°6 | 20 14°6 | 73 19°9 | 14°7 | 74°1 | 10°8 | 12°7 | 5 M | 2 | 117°6 
2°4 2 96°2 | 19 14:2 | 74-7! 18°8| 14°8 | — | 72:4] 9°9)11:1) 6 M 1 112°1 
2°8 M 87°8 | 20 14°9 . 20 15 75 75°2| 94 112) 5 M ir 119°1 
2°5 N 81°5 | 17-7| 13-2 | 74-6! .7°7| 13-28] — |70°8| 85/101} 5 | M | 3 | 1188 
3-2 M 75°4|17-4| 13°6 | 72:4) 17°3| 13°68] — | 75 96;11 | 5 | 8S 4 |1146 
2°4 N 81°0 | 17°1 | 13°4 | 78°4 | 17°2 | 13°4 — |68°8/;102/10°8| 5 | Ss 4 105°9 
2°3 R 100 17°2| 13 75°6 17°28) 138 — |70°5| 9°2|10°4 5 M 2 113 
371 N 645} 19 | 135 | 711) 18-7) 145 | — 70-1) 84/11 | 5 M | 2 /131_ 
3 M 87 118 | 13°7 | — | 18:1} 13°79] 75 7 | 73°9| 92 111] 5 M 2 | 120°7 | 
3°1 N 75°4 | 18-9 | 14°6 | 77-2 | 18°9% 14°63) — | 72°7| 85") 12 | 5 M 2 | 1411) 
2°3 N 80°4 | 18°4 | 13°3 — |18°7| 182 75°9 | 73°3 | 10°1 | 11°1 | 5 | M | 1 | 109°9) 
3 N 82°2 | 19°6| 14 19°8 | 14°3 | 72°2|743/105/116| 5 | M 2 | 110° | 
| 2°9 R 96 18°1 | 13 l 8 | 17°8 | 134 — | 724) 33) 11 | 5 | M 2 118°3 
| 2a R 96°1 | 17-7 | 13 |17°7| 13°2 | 746| 74:3) 98) 107) 5 Ss 3 | 109-2 | 
| 29] R 89°8 | 18-34 13-7 | — | 18°4| 13-9 | 75°5| 73-6) 9 | 102) 5 M | 1 | 1133 
| 3 M 95°2 | 19°5 | 14:2 -- 19°5 | 14°3 | 73°3 | 70°1; 9°7)12 5 5 4 | 123°7 
26| R 91°9/20 | 145 | 72°5|19°9| 14°3 | — | 77-2] 93/121) 5 M 1 | 130-1} 
| 2°2 R 90°8 | 17°8| 13°7 | 78°1|17°5| 13°6 — | 72°7 | 9 10°7 5 | M | 3 | 118°9 | 
| | 
promontory. 160. False promontory, 1st sacral has lumbar characteristics. 162. Wide outlet. 
Coccyx ossified to sacrum (not included). 164. Six sac. vert., one from coccyx(?). Small outlet. 
166. Six sac. vert., all sacral. Double promontory, rounded. 167. Six sac. vert., all sacral. 169. Six 
sac. vert., all sacral in character. 174. Flat pelvis. 175. ‘ Male arch.” 181. Youth. Sacral 


bodies not quite united. 
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187 | 
188 
189 | 
190 
191 | 
192 
193 
194 
195 
196 
197 
















198 
199 
200 
201 
202 
203 
204 
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P06 
207 
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214 


| 215 
216 


| 217 
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U.S. N. M. 
No. 


#21 
*128 
¥*68 
*73 
¥*34 
*15 
#28 
*] 
*57 
*1,.b 
*64 
*79 


A.N.H. M., 
i A 
99/1677 
99/1672 
99/1668 
99/1669 

99/101 
“ 2 ” 
99/1619 
99/1620 
99/1623 
99/1625 
99,1626 
99/2637 
99/1699 
99/1520 
99/2632 
99/2666 
22,183 
U.S.N.M. 
*60 
*149 
*148 


| Average t 


Largest { 


Smallest 


188. 
somewhat 
192, 


Flat pelvis. 
promontory. 


Inter- | 
Tribe or crests | 
Locality cms. | 
| 
Arizona 24°7 
. 25°5 
ve 24 
” 27 
a 24°32 
ne 26 
os 26°6 
” 26°] 
a 26°5 
- 24°2 
rs 26°2 
” 24° 


Nimpkish, B.C. 


” 


Kwakinlt 
N. W. Coast 
Nanaimo V, C. 


” 
N. Samich, B.C. 


” 
Victoria, B.C. 
N. Samich 


” 
Wallula, Oreg. 


Arizona 24°2 
$ 26 
+ 27 


| 
29°] 


| 
“ Male type.” 189. 
asymmetrical pelvis. 


Small outlet. 
196. 
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A Study of the Variations 


Oblique 
Diameter 
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11‘7 


14°9 


in the Female Pelvis 





INLET 
| a 2 is ‘ae Zc 
Ee 5 ~» 'o.s3 $ 
}3e | 8 3 |8se| & 
| 23 & |gza! 3 
| Ss = os - BA) 3 
| —— — 

96 | 1°8 | 78:7) 11:7 hu 
‘ 18 | 72 |106 | 9% 
| 92 | 26 | 76 | 12 9°5 
‘ 2°3 | 69°2|11°5 | 11°5 
119 | 1°4 |100 | 12 9°5 
8°8 | 22 | 67:2) 11°44 | 95" 
8-1 15 | 61°8] 11:4 | 10% 
9 | 17 | 647/11 10°5 
9°4 | 1:99 | 72°3|10°8 | 10° 
95% | 2 77°2 | 11°74 | 10°5 
10°9 | 2°1 | 82-6] 1171 | 10 
10-2 15 | 83°6|10°3 | 9°5 
11:2 14 | 83°6/13°2 | 11 
12°3 | 1:3 | 93°9|12°6 | 12°5 
ll 18 | 106-9 11-7 | 11°5 
115 | 1:3 | 91-3) 12 lid 
10 17 | 775/125 | 12 
10°5 1°5 80°9 | 12°2 ll 
ll 1°5 | 91:7; 9:5 | 11 
10 5 dee ee: aan be 11 
10 14 | 74:1/ 12-5 | 12 
10°5 a a 13 12°5 
10°4 6 | 70 115 1 
98 | 15 | 7% |12°7 {11 
12 18 | 92°3/)12°9 | 11°5 
104 | 16 | 79°4/11°2 | 11°5 
ll 12 | 846 12°6 | 10°58 
95 | 1:3 | 73°1/13°3 | 12°5 
99 | 15 | 76°7/ 11:1 | 12 
10 1:4 | 84°7/11°7 | 10 
10°3 | 1:7 | 79°2/10°5 | 95 
9 2 72°6|10°7 | 9 
10°68 | 1°76! 79°75! 11°59 | 10°82 
14 3°2 | 107-7 | 14°66 | 14 
75 | O8 | 61:5! 9 9 


Sacral canal open all but 0-1 cm. in middle. 


194, 
Six sac. vert., one frora coccyx (?). 


* Original number. 


Specimen not yet permanently catalogued. 


198. 


Pe.vic 

s | 32 
S| 84/38 

3 3% | gs 

8 Piles 
= ay | ay Ee 
Soe a 
9°7 78 4°7 
8'5 6°2 5 
8°6 y he | 54 
10°7 8 4°5 
8:7" 7 6 
8'1 7* 4°6 
9°1 8 4°2 
9°7 7 5°8 
10°3 6°6 5°2 
10 6°3 56 
9°2 6°5 5°6 
8°3" 6°8 4°3 
9°5 8°2 6°2 
10°8 8°3 54 
10°8 7°56 5d 
10°6 8 5°6 
10 9°3 4°6 
9°6 8 4°8 
10 Ys 3°5 
10°5 7°5 5:4 
11 ’ ij 57 
11° 85 5°5 
10°2 7 5°3 
10°3 7 6°2 
10°5* 8 6°3 
10°9 7 4°6 
10°4 72 6'1 
10°6 9 5 
10°1 8°8 3°8 
8°5 ‘ 5 
8°8 6°5 4°8 
8°24 6°3 5°5 
9°79 | 7°56) 5°21 
12°75 | 11°7 7°5 
8 6 2°8 

190. Narrow, 


191. Six sac. vert., one from coccyx? “ Male type.” Narrow outlet. 
Transitional vertebra, left sacral, right lumbar type. 
Six sac. vert., all sacral character, 
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Tuble of Pelvic Measurements of 
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American Indian Squaws (continued). 


A. B. Emmons 

































































OUTLET INNOMINATE BonE Sacrum 
- riche PoeN Co Sieeaar 
= wey | ! | 
bain |e | 3:| : eS 4 4 Beentje, 
S2|s SB. | 3 | ds dz |» tae | de si/alala| #8) e¢ |g 1s 
ays ae Pol Salsa | SS] Sela | = |] BP] S g 4 q | 
2s | 2 ad 3 So) sa} 8/34/84 8/8/88! 8 5 5 be | 5 
3 : 2 3 4 | = ms | a” a = | 3 a a | | Py Ze oO 2? | a 
a | & -) | | 
ie aU Ue FS SOS ER Ss 7] | wee 
3 |28 | M | 82-9183 |13°6 | — | 18-4 | 136 |73-9|74:1| 9°1/ 10-7 5 | M 1 117°6 
3 3 | M_ | 802/183 | 14 76°5| 182 {136 | — | 71°8| 96) 11:3 5 | M 2 117°7 
tok) ON | 71°7| 186 | 14°2 | 76°3| 18-4 | 14* — | 77%) 11 111 5 Ss 3 100°9 
35 {28 | R | 98 | 183 | 15 82 | 183 | 14°6 — | 678; 96/|11°6 5 | M 1 120°8 
2°38 | 2:2 N 72°5 | 19°58 | 148 71°8 19°3 | 148 — | 802; 9°8| 11°6 5 | M 1 118*4 
2°8 | 32 N 711) 19°2 | 13°8 - |19°3 | 138°4 | 69:4 | 74:2 | 11°14 11°] 6 M 2 | 100 
| 33 |3 N | 79°8)18°7 | 14:4 — |19 1146 76-4 71°4; 10 11-1 5 S 1 | 111 
|3°5 |2 M | 882/181 |13°8 | — | 183 |13°5 | 73°7| 70:1) 9 | 109 5 M 1 | 101 
|2°8 | 3°8 R | 95:4) 20°4 | 15 73°5 | 20'S | 148 | — 197 9°7 | 10°8 5/6 M 2 /|111°3 
13-4 | 2°5 M 85°5| 18°5 | 14:2 | 76°8| 18-4 | 14:3 — |76°4|) 9°4/10°4 5 M 3 | 1106 
}3°2 | 2°7 M 82°9 | 20°2 | 14°6 - | 20°7 | 14°6 | 705) 79 | 11-4) 11-4 6 M 3 | 100 
is |3 M 806 191 | 14:2 | 743) 19-1" | 14-2" | — | 796) 97/113 S Se 1 | 1165 
| | ee | 
| 
| | 
}3°2 |2°8 M 72 19°3 | 15°2 — | 19% | 15°2 | 77-9 |.72°5 | 10°9 | 12°3 6 Ss 4 112°8 
|3°3 | 2°8 R 85°7 | 20°3 | 15°8 — | 20% | 16-1 | 78-5 / 81-7 | 12°3)123) 5/6 M 3 | 100 
138 | 1:4 M 92°3 | 19 14°1 | 74:2] 18-7 | 14 — | 82°6/ 11°4| 12 5 Ss 4 | 1053 
}3°7 | 2°5 M 88°3 | 20°1 | 15°8 — |20°3 |15°6 | 76°8| 78-1] 9:8] 12 5 M 1 | 192-4 
13:2 | 2°6 R 80 | 183 | 14 [18-3 | 14-4 | 78°7|73°5| 9-1] 11 5 S 1 | 120-9 
135 | 3-3 N 78°7 | 20°2 |14°8 | 76°!]19°7 |14°6 | — | 808! 8-2] 11:3 5 M 1 137°8 
|2°8 |3 R 105°3|18°3 | 13°6 | — | 18°9 | 13°8 | 73°0|82-2}10 | 11:3] 5 M 1 113 
34/3 M 95°5 | 20°8 | 14°6 | 70-2} 20-4 | 14% 80 9°8 | 11°5 5 M : Ris 
32 | 2°3 R 88 |17°9 | 14:2 | 79°3/)17°8 | 14°71 — | 746) 9:3] 116 5 S 2 | 124-7 
3°4 | 2°6 R 88°5 | 20°7 15°8 — |20°8 | 15°8 | 76 74°8 | 11°5 | 12°7 6 M 1 | 110°4 
33 | 1°8 R | 88°7 | 18°5 | 14°7 — |18°9 | 14°9 | 78°8 | 75°9 | 10°6 | 12°5 5 M 2 | 117°9 
25 | 2°5 M 81:1| 186 | 14°9 | 80°1| 18-2 | 1471 74°4| 9°7| 103 5 S 2 | 1062 
3/3 R 81:4 | 2071 | 15°8" | — | 20°1 | 15°8 | 78°6 | 77°6 | 11°6 | 12 6 s 2 | 103-4 
28 | 2°3 R 97°3 | 19°2" | 14:5 | — | 19-2 | 14°5* | 75°5 | 72°4] 9:3} 11 5 M 1 | 126-9 
}2°8 | 2°1 R 82°5 | 20°84 | 14°88 ~ |20°8 | 14:8 | 71°2 | 83-2 | 9-1 | 11-7 5/6 M 2 | 128°5 
|35 |23 | M 79°7 | 183 | 13°7 | 74:9 | 18-3" | :13°6 — | 76°2| 82 | 10°7 6/5 M 3 130°5 
31 {18 | R | 91 |18°7 | 142 | 765/186 | 1471 | — | 735] 96} 11°6 5 M 1 120°8 
3 126] N 726/181 13 _ |18-6 |12°8 | 68°8| 76-4) 9-7 11 5 s 3 | 113-4] 
2-4 | 2°5 M 83°8|19°1 | 14:2 | — | 19-2 | 142 | 74°5 | 73-8 | 10°8 | 11-7 6 _ See Sie 108°3 | 
Se \e4) N 76°6|19°5 | 14°5 | 744/194 |14°7 | — | 72-2] 10-2) 10°9 6/5 M 3 106°8 | 
| | | | | | 
| Ex st af 5 ee igi: Tea tes Tita Tiras “Sica, j | 
| N-44 | 4 segs. =3 | lst =90 
| 
| 3°35 | 2°81 || N-M-5 | 84-26) 19°27 | 14°52 | 74°6 | 19°32) 145 | — | 752/10 |115|/4/5= 2| S52 |2nd=59/ 115°8 | 
| a 
4°5 43 , M-92 [1089 | 22:2 | 16-9 | 841/225 |168  — |90 | 13-2|13-7| 5=169| M146 | 3rd=49| 174-6 
| | | 
2°5 | 1-1 ce 64 | 16 11°3 | 67°8| 15°7 | 11°15 | — | 63°7 xe 92/5/6= 11| P18 Sepsiiaw 86 | 
? 
. R-71 | | | 6= 32 |reversed 1) 5th=3 
| | | | | 
| t | | | 








199. Transitional vertebra, free on right, one from coccyx. 200. Narrow pelvis. 207. Six sac. 
vert., all sacral. No false promontory. 210. Six sac. vert., all sacral. No false promontory. 
212. Transitional vertebra, right free, but articulates with ilium, left sac. False promontory. 213. Tran- 
sitional vertebra, left free, right sacralized. False promontory. 216. Six sac. vert., one from coccyx. 
217. Transitional vertebra, right sacralized, left free. No false promontory. + False promontory = 18. 
{ Double promontory =2. 
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THE INTENSITY OF NATURAL SELECTION 
IN MAN. 


SECOND PAPER. 
By E, C. SNOW, M.A., D.Sc. 


THE present paper is a supplement to the memoir of the same title issued last 
year*, It is not proposed to give here any account of the work which has been 
done in the attempt to elicit information on the more difficult subject of the nature 
of selection in man, but only to publish the correlations and regressions obtained 
by using an alternative measure of environment, and by varying the periods in 
which the effects ox a selective death-rate can be detected. 

The adoption of another method of correcting for environment implies not the 
slightest shaking of my confidence in the adequacy and validity of that employed in 
the first memoir, but the further work was entered upon because the importance of 
the subject renders the comparison of the results reached by the use of the various 
possible methods particularly desirable. The only criticism I have seen of the 
mode of measuring environment used in the earlier work is that by the Editors of 
the Journal of the Royal Statistical Society. Had there been a tithe of evidence 
supporting the view adumbrated in that criticism the memoir would have been 
practically valueless. Fortunately, however, no arguments whatever have been 
put forward in favour of the view held by the statistical critics, and very cogent 
facts against that view have already been givent. It is quite beside the point to 
show that the corrected standard deviation of the total mortality in the two 
periods considered is only 5°/, or 6°/,. That standard deviation is in a number 
of cases appreciably of the same magnitude as the corresponding measure of 
dispersion in the earlier of the periods used, and, moreover, is many times its 
probable error. It matters not whether that standard deviation be 6°/, or 06°/, 
or 60°/, of the mean value. 


* Drapers’ Company Research Memoirs, ‘‘ Studies in National Deterioration,” No, VII. Dulau 
and Co. 1911. 


+ Biometrika, Vol. vit. p. 456, 1912. 
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The notions used to indicate environmental conditions for this new work are 
very simple. Shortly, they consist in making the mortality of a cohort of one sex 
the measure of the environment for the corresponding cohort of the opposite sex 
on which it is desired to ascertain the possible effects of selection. Thus if we 
wish to investigate the selective effect on the male mortality of the third, fourth 
and fifth years of life of variations in that mortality in the first two years, in 
addition to fixing the size of the male cohort we fix the size of the corresponding 
female cohort and also the total female mortality in the first five years of life. We 
can thus suppose that we are dealing with districts in which the female mortalities 
up to five years of age for the cohorts born in a particular year are the same. For 
these districts we find varying male mortalities in each of the periods considered 
(see Table below), and the mean values of these male mortalities in both periods 
throughout the whole series of districts can be found. Do the mortalities in the 
second period of those districts whose mortalities in the first period deviated in 
the positive direction deviate, on the average, in the positive or negative direction ? 
Districts with the same female environment will possess varying proportions of 
male weaklings. If these weaklings are killed off in the earlier period, the popula- 
tion which survives to the later one is stronger and likely, therefore, to have a 
smaller mortality, and this would be indicated by a negative correlation between 
the mortalities in the two periods (with the proviso dealt with in § xxxrv of the 
memoir). To the criticism that the total male mortality is highly correlated with 
the total female mortality, and that by making the latter constant we are 
practically fixing the former, we can reply by pointing to the considerable 
standard deviation of the total male mortality when correction is made for 
constant female mortality (see Table below). Evidence of a more general 
character, too, can be gathered by turning over the leaves of any of the Registrar- 
General’s valuable Decennial Supplements to his Annual Reports. Pick out a 
few of the registration districts in which the mortality of one sex for any of the 
age-groups given is practically the same and compare the mortalities of the other 
sex among those districts for the same age-group. Quite appreciable variation in 
the numbers will be found to exist. Not many of such districts can actually be 
found, but the method of partial correlation essentiaiiy consists of a contrivance 
by which we can for statistical purposes reduce all districts to a constant type. 


No method of measuring environment can be theoretically perfect. Districts 
under the same environmental conditions would not have the same mortality, but 
the latter would be distributed in some way due to random causes. Of two districts 
under the same general environment one may in a particular year suffer to a 
greater extent from epidemics of measles, scarlet fever or summer diarrhoea, and 
part of the problem of selection consists in ascertaining if these epidemics strike 
more at the weaker children than at a random sample of all children, and this is 
ascertained by inquiring if the surviving population is more immune in a subsequent 
period. The two methods which are employed in this paper to measure environ- 
ment are quite distinct and bear no physical relation to each other. In the one 
8—2 
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method the mortality of the cohort corresponding to the cohort whose history is 
being traced but of the opposite sex, is fixed. In the other, the number of deaths 
of the same sex and between the same age limits in the period under notice, 
apart from the deaths occurring within the cohort, is rendered constant. Neither 
of these, of course, allows for the fact that under a perfectly uniform environment 
the mortality which is taken to indicate that environment should be distributed in 
some way due to random causes. But the general similarity which will be shown 
to exist between the results reached by using the two distinct methods is some 
justification for the claim that each is a satisfactory approximation to the theo- 
retically best method. 


The investigation which is now being described was directed throughout to 
ascertaining the extent of selection in the mortality of the first two years of life. 
In the earlier work of which an account is given in the memoir other periods were 
taken, but later some evidence was adduced to indicate that the first two years 
of life was a natural interval to adopt, as embracing roughly the whole of the 
mortality of infancy and overlapping but very little that of childhood. For the 
second period, on the mortality of which the selective character of that of the 
earlier period is indicated, the next three years of life are taken both for the 
English and Prussian data; in the case of the latter, also, the next eight years are 
employed as a second period. We thus reach results obtained from English and 
Prussian data by working at the same periods in each case, and the comparison of 
these results is of interest. The notation employed throuchout is: 


#, = Births of the male or female cohort considered, in, say, year ¢. 
x, = Deaths in the cohort in the two years, ¢ and ¢+1. 
#,= Deaths in the cohort in the next three years, or next eight years. 


a,= Remaining deaths of same sex as cohort in the five years or ten years (see 
Memoir, § vii). 


#,= Deaths in the corresponding cohort of opposite sex in the five years or ten 
years considered. 


2; = Births of the corresponding cohort of opposite sex. 


Previous experience suggested that the correction for a constant value of «, in 
addition to constant values of a and a, would have little effect on the correlation 
between 2, and a. The first case worked out supported this view; in that (the 
Prussian male cohort of 1881, dealing with the first ten years of life) the correlation 
was only altered in the sixth figure by the extra correction, viz. from — ‘944206 to 
— ‘944209. Thus the considerable labour involved in making this further correc- 
tion is not justified by the extra value it gives to the results, and in all the other 
cases it was omitted. 


The partial standard deviations, correlations and regressions with their probable 
errors for the various sets of data are given below, the other standard deviations 
and correlations on which they are based being shown at the end. In the table 
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on p. 61, «014. denotes the partial standard deviation of the total mortality in the 
sum of the periods considered, 4 ¢. denotes the expected (partial) correlation if 
there were no selection (see § XXXIV of the memoir), and ,b,, denotes the (partial) 
regression of the mortality of the second period on that of the first. 


Dealing first with the results from the English data, we notice that the regres- 
sions by the two methods for males are fairly similar, but for females they are, on 
the whole, smaller by the new method. Having regard to the probable errors we 
can draw no inferences concerning the differences. The correlations for females, 
however, are in two of the three cases considerably smaller by the new measure of 
environment, and this difference appears to be significant. The partial standard 
deviations by the two methods occasionally show fairly large differences, but in no 
single case is the disagreement significant. For all the six cases, however, yo, is 
less than go, but go, is greater than ,,c,. The mean of the male regressions by 
the first method of measuring environment is —*142 and by the other —°143, the 
corresponding figures for females being —*179 and —‘117. The mean regressions 
of the mortality of the 4th and 5th years of life on that of the first three years 
are —‘085 and —‘172 respectively*. Thus, so far as males are concerned, the 
intensity of selection appears greater when measured by the regression of the 
mortality of the 3rd, 4th and 5th years on that of the first two years of life than 
when measured by the regression of the mortality of the 4th and 5th over the first 
three years, but the same conclusion does not so definitely hold for females. It may 
be, as was suggested in the memoir, that the age division between infant and child 
mortality is not the same for females as for males, and the inference is put forward 
tentatively that the ailments of infancy (as distinct from those of childhood) 
attack females to a rather greater age than they do males. It will be noticed, too, 
that the regressions and correlations for the 1872 cohort are smaller than for the 
other cohorts, and that this is accompanied by the fact that the mortality of that 
cohort was smaller in the first period. On the whole, so far as the data for the 
English rural districts are concerned, the adoption of a new measure of environ- 
ment leads to no alteration of view as regards the existence of selection, nor, 
roughly, of its numerical intensity. 


When we turn to the results from the Prussian data in which the same periods 
(the first two years and the next three) are used as for the English data, the 
most marked feature to be noticed is the considerably larger correlations and 
regressions which are obtained. The mean value of the regressions for males 


* I take here the opportunity of correcting a mistake which occurred in connection with the work of 
the first memoir, through an error in transcribing from the schedules containing the raw data. In 
§ xvim, in the portion of the Table for the 1872 cohort referring to females, the following alterations 
should be made : 

#,=487, o,=119-7, 

Ta = *812328, Ti2>= “718137, Ti3>= ‘944379, sou = + 464859, 32> *168670, 103712>= — “184068, 
These entail the following corrections in the Table on p. 33 for the same cohort: o30;=34°844 and 
partial regression= —°0591. These alterations reduce the correlation and regression but make the 
results more consistent, and necessitate little modification of the conclusions drawn from them. 
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is —‘794 by the first method and —*647 by the second, while for females the 
figures are — ‘798 and — ‘878, these comparing with corresponding values for the 
English districts ranging from,—*12 to —*18. Thus the criterion which we take 
as the measure of the intensity of selection was for the Prussian cohorts of 1881 
and 1882 about five or six times as large as that for the English cohorts of 1870, 
1871 and 1872. We can assert with some confidence a considerably greater 
selective effect of the mortality of the first two years of life on that of the next 
three in the case of Prussian rural districts than of the English rural districts in 
the epochs considered, and this fact is concomitant with a far greater stringency of 
infantile conditions in the former than in the latter. This is seen from the follow- 
ing figures. 





j 
Mean Number of Deaths in | 
Mean Number of Deaths in | the next Three Years divided 
| First Two Years divided by | by Mean Number of Births 
Mean Number of Births minus Deaths in First Two 
| Years 
We Le Se es Si) hp ne ee 
| Male Female Male Female 
| ey Cig oe 2 Pe ee 
| | | 
(1870 | "185 | "157 ‘037 | 036 
England } 1871 | 174 "146 | *038 037 
(1872 | 164 | 139 | O41 | +041 
Prussia J 1881 | 241 214 | *069 (102) | 067 (102) | 
aa 260 ‘228 075 (107) | 074 (-107) | 


The figures in brackets give the corresponding numbers for Prussian districts 
for the mortality in the eight years following the first two. 


The data for English rural districts do not allow us satisfactorily to follow the 
cohorts beyond their first five years of life, so that we cannot assert that the 
intensity of selection is generally less in England than in Prussia for the popula- 
tions considered, but we can point out definitely that the effect of selection in the 
first five years of life was much greater in the latter country than in the former. 
Whether or not thé English cohorts make up the leeway at later ages can only be 
a matter of speculation. We can at present merely state that whereas a district 
in England which had an excess of 100 male survivors above the mean for all 
districts at the end of the first two years of life had, on the average, about 14 of 
these survivors killed off in the next three years, a similar district in Prussia lost 
more than 70 of the 100 in the same period. 


When, for the Prussian data, we come to deal with the results of including the 
eight years following the first two, we find that the regressions for the 1881 cohort 
have increased appreciably, but those for the 1882 cohort have not done so. There 
is nothing incongruous in this, as in one cohort selection might well be felt more 
in the 8rd, 4th and 5th years than in the other, and in this latter the effect would 
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then probably come in later years. The tendency of the partial standard devia- 
tions for the Prussian data is opposite to that for the English, viz. for males oc, 
is greater than ,o, and yo, is less than qo,. For females, however, these are 
reversed, except for the 1882 cohort in the ten-year period. Another feature of 
the Prussian results is that the regressions for males are smaller by the new 
method of measuring environment, but for females the reverse is the case. This 
is also true for four out of the six examples from English data, and arises chiefly 
from the differences in the male and female variability in mortality in the second 
period compared with the first. In the first two years of life the (partial) standard 
deviation for males is always greater than for females, but for the second period 
the female (partial) standard deviation is in some cases the larger, the mean 
mortality in this second period being about the same for the two sexes. 


In the memoir (§ xxIv) a short discussion is given to the question of what 
amount of correlation between the mortalities should be expected if selection were 
entirely absent. This has been referred to as .s¢@. and ge. in the present paper. 
These values are only intended as approximations, and it would undoubtedly be an 
advantage if by direct correlations we could obviate the use of such corrections. 
These direct values could be obtained by correlating the mortality rate of the first 
two years of life (based on the number of births) of the cohort with the mortality 
rate in the 3rd, 4th and 5th (or 3rd to 10th) years of life (based on the number of 
survivors to the age of two), correction being made in some manner for a constant 


environment rate. This would entail a correlation between such variables as — 
q 


Z 
~ 


and 





and in my opinion might involve an element of ‘spurious’ correlation, 
Z@- 


and for this reason alone rates were not used in the memoir. So far as I can 
understand, however, the critics of the memoir do not hold this opinion, so that to 
them the corrected correlation between two variates of the above type is probably 
as satisfactory as a partial correlation of the third order. The employment of such 
correlations saves considerable labour and requires no discussion of the question 
of ‘expected’ correlation if selection were inoperative. Accordingly, for a few 
cases, the following new variables have been taken : 


Z)= Male or Female Deaths in the first two years of life divided by Male or Female 
Births, 


z,= Male or Female Deaths in next three (or eight) years of life divided by number 
of survivors to the age of two, 


2,= Total Female or Male (i.e. of opposite sex to z and z,) Deaths in the whole 
five (or ten) years divided by Female or Male Births, 


and the values of .7, worked out. The statistical constants on which they are 
based are given below: 
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| Prusstan Rurat Disrricts ror 1882 
EnouisH Rurau Districts) ail Sta F = : 
For 1870 | 
4 Five Years Ten Years 
| 
Male Female | Male Female Male Female 
| | | | 
| & 184 | *156 241 *209 “241 | "200 
A 037 | 036 “069 “069 "100 | 103 
dy 187 |} °214 "264 "292 *290 | ‘315 
o% “0201 | 0184 0541 “0508 “0541 | °0508 
co; 0079 | "0092 0263 “0241 “0291 | °0253 
o2 *0232 | ‘0219 |} *O0611 0653 | *0595 | °0647 
| Ton 271543 | °463227 “634918 633825 | 596721 | °576631 | 
| og "854400 *877328 *951975 975109 | -937904 | °967319 
119 *528884 | | 130 “817686 *755164 | °811763 °723951 
ko |— 409 | -08' |- 814 --705 |--813 ~-107 
209 0104 fl 0166 0113. | --0188 0129 
Pal ‘0067 | ‘0076 “0151 0158 0170 0174 | 
| edi) | — °262 |— 075 — 744 — 990 | — 734 — ‘958 | 
| 





If the values of ,7, here are compared with the values of gry. in the earlier 
tables for the corresponding cases we find very little difference. Thus: 


Correlations obtained without using Rates, 
—- 415 -— 158 -—‘917 -—‘768 -—-920 —-789 
Corresponding Correlations by use of Rates, 
— 409 --087 -—-‘814 -—'705 -—°813 —-—-°707 
If we reduce the figures in the first of these by the values of ,.¢,, we have the 


sequence 
— 347 --105 --'809 -'710 --781 --701 
The agreement between the last two lines of figures is surprising and remarkable. 


In all cases we get approximately the same numbers as before, a quite unexpected 
result, 


Lest these results should hastily be pointed to as evidence against the possi- 


bility of spurious correlation arising when such variables as — and 





are 


correlated, it should be pointed out that the deviations from the mean values of 
the variables are in some cases considerable, and that the third and fourth powers 
of those deviations cannot be neglected in comparison with the corresponding 
powers of the means. Thus the formula which is usually exhibited to show the 
possibility of spurious correlation does not apply to this case. Whatever the 
magnitude of the ‘spurious’ element, if any, involved in the correlations just found, 
they can only be construed as supporting those previously found and as evidence of 
the existence of selective mortality in the populations dealt with. 


Professor Pearson, also, has pointed out to me another possible mode of 
attacking the problem. This is to render constant 2,—2, instead of a, and in 
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addition to #, or a, as before. This would fix, not the number of births, but the 
number of survivors at the end of the first two years. Then the population liable 
to mortality in the second period would be the same for all districts, but the 
districts with the larger number of births would be exposed to the possibility 
of greater mortality in the first period, and therefore to the possibility of greater 
mortality of the kind which is taken to measure environment (the mortality in 
the first two years being much greater than in the next three or next eight). 
Thus an ‘expected’ (negative) correlation if selection were absent would not arise 
in the same manner as before, but would probably be entailed in the partial 
correlation. A priori, however, it does not appear that this method would pro- 
duce a correlation, if selection were inoperative, of the same magnitude as that 
indicated in § xxiv of the memoir, since the population rendered constant stands 
in an intermediate position to those at the beginning and end of the periods 
considered. The correlations under these new conditions have been worked out 
only for the case of the 1884 Prussian cohort, both male and female. The results 
differ but very little from those reached before, and must, I think, be taken 
as supporting the substantial accuracy of the interpretation put upon the earlier 
ones. If x, denote 2 —,, the following are the additional correlations : 





Males Females 
os 3117°6 31751 
151 "924739 "928746 
152 *847138 "851981 
1's3 *924711 "942791 
"54 *927020 ‘934079 
ile _ -918 — “747 
pr ~ -780 ~ -806 


The values previously found for 7, and ,4r,. were —°917 and —-786 for males, 
and —‘768 and —‘797 for females. It appears therefore to be of little account 
whether we make # or 2,— 2, measure the size of the populations; we should 
probably, too, get similar results if we put 2 —#,— 4», ie. the size of the cohorts at 
the end of our survey, constant. 


On the whole, the work of which this paper gives a short account has justified 
itself by the confirmation and emphasis it gives to the results previously obtained. 
The general impression received by a study of the results reached by the employ- 
ment of the new method of measuring environment alone is much the same as 
that derived from a survey of those by the earlier one, though individual differences 
of appreciable magnitude occur. Apart from the emphasis it gives to the results 
of the memoir, the present work has discovered, I think, a significant difference in 
the operation of selection on the mortality of the first five years of life in Prussian 
and in English rural districts, and suggests (but, at present, no more than suggests) 
that there is some differentiation in its effect upon the two sexes. But the 
existence of a selective death-rate in the general populations dealt with admits of 
no doubt. 
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LisT OF STATISTICAL CONSTANTS ON WHICH THE PARTIAL CORRELATIONS 
AND REGRESSIONS ARE BASED. 


English Rural Districts. 




















1870 Conort bs 1871 Conort | 1872 Conort 
siciehin we - 
Males Females Males | Females Males Females 
| as ey A fade oe ae gee 
| &% 3227 3090 | 3226 | 3114. | (3291 3150 | 
as 2669 2240 | 2635 | 2205 | 2634 2189 
e 579 694 | 550 | «663 548 650 
a | 4883 467°8 506°6 | 480°8 | 520°3 477°0 
o1 127°7 105°6 | 130-0 | 102°6 126-0 107°4 
a 27°5 30°6 | 231 | 28°4 33°0 31°3 
a3 5810 | 5080 | 5823 | 527-2 605-9 545-1 
o4 128°6 146°1 | 120°7 145°8 133°5 153°8 
01 884086 | 856073 | °859467 *834253 ‘793680 "814537 
2 637792 | *643144 "609366 513438 ‘716133 "662323 
Tog 852236 | ‘831001 | -835223 “797029 ‘807817 “752425 
Tos "849348 “881602 “838168 "836549 "824623 ‘777569 
rie "608255 ‘688624 ‘638919 581274 “804386 ‘791884 
13 | °930579 925657 931412 | -934695 “945229 “932208 
‘14 | *952680 958944 947168 970562 “969370 965413 
| 43 | °775362 "845435 799258 757490 "886385 "874839 
94 706226 749785 ‘776551 "627105 "838194 "842120 
sf | + °475208 «=| +°412648 |+4+°362379 |+°415864 (+°156490 +°475066 
wo. |+°467000 |+-079683 |+4+°323442 |+-169218 —-040919 +°389982 
so + | -— 069637 |—+199985 |—-176081 |—-229035 |+4+°000352 +-012762 
woo =| +°101579 |—-0572138 |—-120807 |-—-026164 |+4-080835 +-022190 
wi =| - 490155 |—-464969 |—-482395 |-—-546204 |—-221368 —-135040 
Te | - 299955 |—*161886 |—-478055 |—-145896 | — 060726 = - “150136 
| 
Prussian Rural Districts (Five-year period). 
1881 Conort 1882 Conort 
Males Females Males Females 
i | 
rN 9407 8917 9297 8793 
&s 11723 | 10083 12047 10363 
| 2384 2760 2510 2938 
| 4586°9 4398°1 4689°5 4504°8 
ee 1502°2 1276°4 1654°1 139771 
lege 343-2 324:2 375°6 360°7 
| a3 7758°1 | 6734°5 8044°8 6997°8 
| os 1558°6 1794°5 | 1716°2 1981°5 
01 964196 | 963495 —s| “967685 “965286 ‘ 
ros 844586 | ‘846971 | -860460 ‘869316 
Tos 964747 | ‘961005 | 967523 967466 
Tos ‘963378 970151 “963954 973849 
ro 817746 "844333 "841394 “864055 
113 990456 ‘990191 "995000 "995279 
14 ‘988301 994536 “989235 995822 
Ig ‘884689 | 903554 *881938 ‘901711 
194 “891353 "882250 906736 ‘897541 
01 + 238636 | ++308402 + 198035 + 097214 
| sor + °295597 — 053527 + °362344 — +216609 
Ee — 072672 | —-+196905 + 060222 — 027966 
02 —116186 | —-078351 —+121140 — “047446 
eons —+910479 — 919631 — *756884 ~ *795986 
| alia — 913825 — 673466 — *892589 —°738506 
9—2 
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Prussion Rural Districts (Ten-year period). 


1881 Conort 


‘971578 
“969132 
| *825960 
991230 
983883 
*889151 
"909835 
+°119558 
*301348 
*037627 
*134827 
— ‘916019 
— 932723 


b++ 


Also for the 1881 male cohort: 
999225, 


’33= ‘967678, 


ss = — 131819, 


1882 Conortr 








Females | Males | Females 
| 

8917 9297 8793 

24650 23137 24480 

3000 2733 3156 
4398°1 4689°5 4504°8 
1276°4 1654°1 | 1397°1 
4450 470°7 449°9 
15723°8 18015°8 | 15645°2 
1901-3 1806°8 | 2073°1 
963495 “967684 "965285 
"871859 "895378 913858 
‘968570 ‘975529 =| «= 978221 
‘973037 968885 ‘977710 
"854617 "860826 "883242 
991070 995812 995626 
‘993559 “988210 994336 
909771 "890463 911441 
902419 “923618 “921165 
+°107783 | —-186994 — ‘221147 
—*125303 | +-271614 +°161865 
— 090243 + *266908 + :260740 
— :062670 + 005249 — *308612 
— 849620 — °622720 — °629876 
— ‘860001 — 884404 — ‘790688 

T= °965417, T= ‘866632, 

3705 = +°987800, 3745 = + °295690, 


03715 = — 


‘013340, 


oes 12 = — 944209. 


o3!"95 = “+ 008833, 
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ON ERRORS OF RANDOM SAMPLING IN CERTAIN 
CASES NOT SUITABLE FOR THE APPLICATION OF 
A “NORMAL” CURVE OF FREQUENCY. 


By M. GREENWOOD, Junr. 
(From the Statistical Laboratory of the Lister Institute of Preventive Medicine.) 


I. 

Introduction. 

THOSE who believe that the more closely a branch of knowledge adapts itself 
to the principles of quantitative reasoning, the more justly it merits to be ranked 
as a science, have been gratified by the improved standard adopted in the treat- 
ment of medico-statistical results. It is true that even now medico-statistical 
writers fall short of the attainments regarded as essential in some other depart- 
ments of natural knowledge, and that a few prominent investigators vaguely 
denounce “ mathematicians ”—by which term is to be understood any one trained 
to employ modern biometric methods—as presumptuous intruders within the 
sphere of experimental medicine. Despite these obstacles, progress has been 
marked within recent years and we may have considerable confidence that future 
discussions as to the value of such procedures as vaccination or the determination 
of an opsonic index will be conducted with due regard to the claims of exact 
science. 


It is, however, in the nature of things that a reform of this magnitude should 
be accompanied by certain disadvantages which tend to impede the march of 
ideas. For instance, reformers may urge that the employment of certain argu- 
ments requires for logical validity the application of some specific test. After 
much discussion, the point is conceded and then the test is in danger of being 
applied in other and unsuitable instances. 

The particular illustration which has prompted these remarks is the employ- 
ment of some consequences of the current theory of errors of random sampling in 
certain cases which frequently arise in medical and pathological work. At one 
time it was customary to base conclusions as to the efficacy of some method of 
treatment upon short series of cases without any statistical test being employed. 
A practitioner might find, for example, that of 100 cases of typhoid fever treated 
without any special precautions as to diet, six had died. Of a subsequent 100, 
dieted in a particular way, but two succumbed and a conclusion very favourable 
to the new method of treatment might be ventured. Owing to the partial 
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permeation of medicine by quantitative methods, it is widely felt that this proce- 
dure is questionable and that the “ probable error” of the result must be found. 
The medical writer who has attained this level accordingly refers to a text-book 
and tests his proportions upon the basis of a “normal” curve of errors with the 
binomial standard deviation J npg. 

The specialist in mathematical statistics is aware that the time-honoured 
theory of the “ probable error” rests upon certain assumptions of a quite definite 
character not adequately fulfilled in the imaginary case described. Warnings as 
to this are given in the better text-books, and are indeed unnecessary for those 
who care to read the proofs of the usual formulae. 


We must, however, bear in mind that not every medical man has either the 
time or the training requisite for the comprehension of mathematical analysis and 
many will be inclined to consult a book which, while giving formulae without 
proofs, contains explicit instructions as to their practical employment. Such a 
book as, for instance, Professor Davenport’s Statistical Methods, seems admirably 
adapted to the needs of the laboratory worker. On p. 14 (2nd edition) he will 
find the following sentence :—‘ The probable error of the determination of any 
value gives the measure of unreliability of the determination; and it should 
always be found.” ‘The statement is commendably clear but, unfortunately, quite 
incorrect in many cases which come under the notice of the medical inquirer. 

The present memoir is an attempt to make the limitations of the process 
recommended by Professor Davenport arithmetically obvious to the medical 
reader, and to provide the latter with some assistance in the exceptional cases, 
To the trained mathematician or biometrician I have nothing to offer which is 
novel and little which is of interest, while the medical reader may find some 
difficulties in following every step of the inquiry. I hope these difficulties have 
been reduced to a minimum, but a risk of falling between two stools has to be 
faced by any writer dealing with a subject not new in itself but relatively so in its 
applications. My biometric colleagues will recognise the difficulties of the task, 
and are alone competent to determine the measure of success or failure achieved. 


IT. 


The chance of an event happening is p and of it failing, (p+ G=1). What is 
the “probable error” of pm successes in m trials ? 

In the problem stated the probability for the occurrence of an event and the 
independence of the happenings are supposed to be known. This either means 
that they have been ascertained by long experience or that their values (i.e. the 
value of p and the zero correlation between the results of successive trials) are 
defined by an hypothesis which we desire to verify. Accordingly the distribution 
of successes in m trials is given by the expansion of the binomial (p+ q)". If m 
be moderately large and j~@ small, the ordinates of a “normal” curve with 


Standard Deviation Vmpq are a close approximation to the terms of the binomial 
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and the “ probable error” of mp is + 67449 Vr 3G. This is the classical text-book 
case. Its limitations are obvious. If either p or g be very small unless m is very 
large indeed, and for all values of p and g when m is very small, the normal curve 
does not approximate closely to the binomial. 

Consider this problem. A certain bacillus is stated to occur in the mouths of 
2 per cent. of all normal persons. Twenty persons have been examined and the 
bacillus was isolated from two of them. Is this observation consistent with the 
truth of the hypothesis ? 

Let us find the chance that in 20 trials two or more successes would be met 
with if p= ‘02, 7=98. 

By direct calculation we find this chance to be about 1 in 17. If we use a 
“normal” curve with Standard Deviation ,/20 (02 x ‘98), the chance proves to 
be rather less than 1 in 25, or the probability determined in this way is only 
68 per cent. of the real value. Of course when the number of trials is so 
small we could not expect a continuous function effectively to represent the 
binomial expansion, but even for m large the inadequacy of the “ normal” curve, 
in the case of #~ g not small, must be insisted upon. I think the best way of 
making this clear arithmetically is from a consideration of the moment coefficients 
of the binomial (p + 7)”. 

With the ordinary notation we have :— 


fo = Ompq, 


BM, = mpg (p — q), 
fy = c'mpg {1 + 3 (m— 2) pq}, 
Ms" a 
and i, = aD B. = a 


In cases like the present, c may be taken as unity. 
For a “normal” curve to be a good fit to the binomial, 8, should be very small 
and £, nearly equal to 3. 
Take as an illustration the values of @, and £, for different values of m where 
p= 02, 7='98. 
We obtain: 


| 
m | By Be 
| 
‘ 
100 | “4702 3°4502 
200 2351 3°2251 
300 |  °1567 3°1501 
400 | ‘1176 3°1126 
500 | “0940 3°0900 
600 *O784 3°0750 
700 ‘0672 3°0643 
} 800 “0588 3°0563 
900 “0522 3°0500 
| 1000 0470 3°0450 
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From these figures it is plain that even comparatively large values of m do not 
admit of the binomial being closely approximated to by a “normal” curve. 

Since, however, direct evaluation of the terms of the binomial is very tedious 
when m is at all large, we need a curve which bears the same relation to the skew 
binomial that the normal curve does to the symmetrical binomial. Such a function 
was provided years ago by Pearson*, viz. his Skew Curve of Type III, 


y=yeor™ (1 + ey, 


+ 
h -] = — 1 
where 8 ( 7 5) : 
mpq m 
y ==—— (taking the unit of measurement c= 1 as before), 
PY 
ys*.e-* 


Oe Patsy 

To use this curve with the rapidity possible in the case of a “normal” curve, 
we need tables not at present published. 

In any particular case, however, the curve may be calculated and the area 
between assigned ordinates approximated to with little labour. 

To sum up, we have the following rules for practical work when j is known or 
assumed. 

(1) When m is small, say less than 25, the binomial expansion should be 
directly evaluated. 

(2) When m is moderately large and pj or g not small, say not less than ‘1, 
the ordinary method based on the “normal” curve can be trusted. 

(3) If m is moderately large and j or @ less than ‘1, a skew curve of Type ITI 
should be fitted from the momental constants of the binomial and the areas 
between assigned ordinates estimated with the help of quadrature formulae. 


III. 


Tf in n trials an event happened p times and failed q times, what is the probable 
distribution of successes and failures in m subsequent trials and what are the 
respective chances of 0, 1, 2, ... m successes in m trials, it being assumed that the 
occurrences are independent and that the “universe” of events is indefinitely greater 
than n +m? 

This problem is of fundamental importance. We note at once that when the 
last. condition is imperfectly fulfilled an important special case may arise, for we 
then have :— 

n+m 

N 


* For a recent précis of the relevant facts, see Pearson, K. ‘‘On the Curves which ave zivsi suitable 
for describing the Frequency of Random Samples of a Population,” Biometrika, 1906, Vol. v. p. 172. 


’ 


finite where NV is the number of events comprising the “ universe’ 
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or “population” from which the samples come. This problem of drawing from 
a “limited universe” will not be considered in the present memoir; it has been 
discussed in the paper of Pearson last cited *. 

The class of problem to which attention is now directed may be typified as 
follows :— 

Fifteen “control” rats have been inoculated with a constant dose of a standard 
culture of plague bacilli and twelve succumbed in a certain time. Ten similar rats 
have been immunised by a method it is desired to test and five of these died after 
inoculation with a dose of culture similar to that employed upon the “ controls.” 
What is the probability that the deviation from the rate of mortality obtaining 
among the “controls” is a chance event ? 

Evidently the methods of pp. 69—72 cannot be used. To state that the a priori 
chance of dying is ‘8 is to ignore the fact that the size of the “control” sample 
does not justify us in assuming that its proportiona! yield approximates at all 
closely to that of the whole population. 

Let us, then, start from first principles, merely assuming (an assumption based 
on or supported by the fairly wide practical experience of civilised humanity) that 
all possible events are, in the absence of any grounds for inference, equally likely 
(Bayes’ principle). 

On this assumption, we have, by Bayes’ Theorem for the chance P, that the 
true probability of an event, observed to happen p and fail g times in trials, is 
between w and a + 6a: 

a? (1 —a)idax 


: 
[ a? (l—a«)idx 


P, = 


A second trial of m being made, the total chance of its yielding r successes 
and m—vr failures is: 
m! 
ri(m—r)! 


1 
[ gPtr (] oe, aw)jatm 'dax 
ee seats cl atkccn dooce (1). 
| a? (1—a2)Idx 
“0 


This is, in modern notation, the result contained in the 7th of Condorcet’s 
problems published in his Zssai, 1785+, but Laplace had, eleven years previously, 


given the theorem with the omission of the term (i.e. working on the 
? 


m! 
‘I(m—r)! 
standard model of an urn from which balls are drawn, he assumed the drawings to 
have been made in an assigned order). 

To Pearson f, 
enormous statistical value of the theorem. The usual method of treating (1) has 


whose symbols will be used, belongs the credit of emphasizing the 


* See Pearson, op. cit. pp. 173—5. 

+ See Todhunter’s History of the Theory of Probability, p. 383, and for a similar result obtained by 
a different process in 1795 by Prevost and Lhuilier, op. cit. p. 453. 

t Karl Pearson, “On the Influence of Past Experience on Future Expectation,’ 
Magazine, 1907, p. 365. 
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been to show that, under certain conditions, the probabilities of different values of 
r can be represented by the ordinates of a“ normal” curve*, The nature of the 
assumptions involved will be placed in the clearest light by the following 
considerations. 
Substituting 0, 1, 2, ... m for r successively in (1), reducing to B and then to 
I’ functions and finally evaluating each term, we have for the chances of 0, 1, 2,... m 
successes in a sample of m after a first sample n = p+ q: 
mp+1 ,m(m—1) (p + 1) (p+ 2) (2) 
ligtm 2%(q+tm)(g+m—1) “JS : 
where 0, = D (q+ ts +1) TP (n+2) 
C(q+1)P (n+ m + 2) 
We may notice that, if p and g are both very large as compared with m, 
(2) reduces to 


(2)" (a Pe ak ks ete. ) 
n ql! 2!q? 


rf 
Cy 1+ 
\ 


= (2)" (1 + if =(p+@q™, where p= . and g = : iexwed (2) bis. 

The conditions for the approximation of this binomial to the “normal” curve 
have already been noted. 

More directly, the approach of (2) to a “normal” form can be examined by 
treating the series in brackets, which is a hypergeometric series having as 
parameters 

a=—m, B=p+l, y=—-(qt+m), 8=1, 
by the method of moments and then noting the conditions under which the 
momental constants 8, and 8, approximate to the values 0 and 3 respectively. 
This method was adopted by Pearson who had, several years before the date of 
the publication last cited, obtained the moment coefficients of a hypergeometric 
series. 

The results are that: 














(14.9 (@-DY 
pn ee (8) 
1 m (p+1)(q¢+1) 1 m— 1 Tee e eee ee ee eee eee eee eee eee ee ’ 
n+3 
m—1 8 
ppt ly 8418) 
9 ae 5/ 
B,=8(1-=) n+4 m-2 n+5 
‘ 14 "-! 
n+3 
_m—-1 m—2 
(n+2P_ + wag (1 +Ss5) (4) 
m(p+1)(q+1) ne, ral as ; 


‘+ n+3 


* See, for instance, Czuber’s Wahrscheinlichkeitsrechnung (1903 Edition), pp. 151 ete. 
+ Karl Pearson, ‘‘On Certain Properties of the Hypergeometrical Series, and on the fitting of such 
Series to Observation Polygons in the Theory of Chance,” Philosophical Magazine, 1899, p. 236. 
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If we write p=" "Be @ e=P=4 


rat - and if m and n are both absolutely large, 
: m\? 

apy (1 +25) 
q-py__(1*?5) 


we find for (3) and (4) 





y= — (+e) @ne) ee OS eee (3 A), 
-+ = 
m m 
Pe pene ee een (+ 5) (4) 
m (p+ e)(G—€) +2 
If now m be small relatively to n, 
<-oe a ee 5 
= m+ =e) and B,=3+ ntetou_o (5). 
If n be small relatively to m, 
<2 ak Bee acces, (6). 


~ n(p+e)(G—e) "" n{p +e) (G—-e) 
After exhibiting these results, Pearson remarks* : “ Both forms result—for n or 
m large and the product of either with p and g not small—in 6, =0 and B, =3, 
ie. in the symmetry and mesokurtosis, which are for practical purposes closely 
enough represented by the Gaussian curve. But if m and n be commensurable, 
and either p or G moderately small, this result by no means follows.” 
It is accordingly plain that in all cases of m and n both small the use of 
a “normal” curve with s.D.= mpg is inappropriate. When p= the condition 
of mesokurtosis is fulfilled and the divergence from “normality” reduces itself 
to the difference between the Gaussian and Pearson Type II curves. The 
accompanying table illustrates this in a particular example. 


A Second Sample of 10, after a first Sample of 100; p=q=50. 


Comparison of Series with Curves (Totals = 100). 


ee Curve of Type II 
Normal Curve Normal Curve JE 


Sueceases | LYPergeometric My he x? 10°29177 
ee TE sce S.D.vnpq | 8.D.V(n+1)pq | y=23412 (2 ~<a) | 
| | To wee ee 
coon | 
0 146 ‘221 ‘333 “188 | 
1 1243 17122 1-408 | 1°331 
2.) 4-931 4's 4°843 5-091 ’ 
3 12-017 11-447 11°702 12°107 
4 19°922 207452 19°728 19°729 
5 23°480 24°817 23°972 23°105 
Boy 19-922 20°452 19-728 19°729 
7 12-017 11°447 11°702 12°107 
Roe 4931 4°349 4°843 | 5°091 
9 1°243 1°122 1408 1°331 
fae "146 221 333 “188 


* Op. cit. (1907), pp. 371—2. 
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A few arithmetical results may now be given. 








Let n=100 and m=50. 
Area * of ‘* Normal” 
From Series (2) Curve, with S.D. Vm pq 
p='4, 7=6 
Chance of 20—22 Successes *255t 271 
p='l, 7=9 
Chance of 5—7 Successes 3193 4739 
3 0—2 ‘i ‘1311 1145 
p=01, g='99 
Chance of 0—2 Successes 8938 9202 
e SS ao ‘1007 0022 


We see how the | 


~~ 
An interesting special case may be discussed here which emphasizes the 


importance of the p 


iability to error increases with p ~ q. 


roblem indicated. 


Suppose the first sample has given all successes or all] failures, so that p or 
7 =0, how are we to measure its reliability ? 


Many unsophistic 


ated users of formulae must have been puzzled by this case, 


since, construing the formulae au pied de la lettre, it would appear that after 
n successes in 7 trials, we ought to get m successes in m with a probable error 


of 0! 


The paradox vani 


From this we see 


chance of m successes in m trials) is 


shes if we consider (2). Put in it n=p and we have 





met aM re = ne I)(n he, SR eee 
(n+ m+1)! | 1! 2! 
that the ratio of the (m+1)th term to the whole sum (i.e. the 
1 . 
hes ; from which we conclude: 
n+m+1 


(a) Only when n is very large as compared with m does the chance of 


obtaining 100 °/, su 


ecesses in m trials approach unity. 


(b) In particular if n =m and both are large, the chance is about °5. 


For instance if we have had 100 °/, successes in 200 trials the chance of getting 
the same proportion in a subsequent 50 is about 4 to 1. If, on the other hand, 


n = 50 and m = 200 


In view of what 
the sum of any num 


it is 1 to 4. 


follows it may be worth noticing that a closed expression for 
ber of terms of (7) can readily be given. 


* Taking for area corresponding to x successes, the area between the ordinates «—-5 and x +°5. 
+ Approximate only, obtained by using Stirling’s theorem in the expression 


to find the rth term of series (2), 


m!(p+r—1)!(q+m-r+l1)! 
(m-r+ 1)! pl(qtm)!(r-1)!’ 
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Write in Euler’s identity * 


1 —a, + @ (1 — ay) + aya, (1 — as) +... + Gay... dn (1 — Gay) = 1 — ayagds ... Any, 


cnc, ye 
y Yr pr 
Multiply by s — and subsequently put y = 0. 
We have ee th (+P) +... = (F=P) fF +P) = (722s) eal (8). 
° Pr Pipe Pr \ Pe Pn 5 
Putting n+l=a, p=1, p,.=2, ... pm=m™M, 
F »\ y=m 4 
pp 82 * RE. ="Tl (1+"**) vcgueeicoial (9). 
1 2! y=1 - ie 


Reverting to the general case, we note that for testing the divergence between 
first and second samples the formula (2) must always be employed when m and n 
are commensurable and px @ not small. This rule certainly applies to all cases of 
m and n less than 300 or 400 and p (or g)<°‘1. If m and n be large the best 
plan will be to fit to (2) the curve indicated by the momental constants, using its 
proportional areas (obtained by some convenient quadrature formula) precisely in 
the manner adopted with the tabled areas of the “normal” curve. 


Such a method is, however, not convenient for laboratory workers nor specially 
appropriate when m is a small number, since in that case the terms of the dis- 
continuous series are not closely represented by a continuous curve. 


Evidently what one needs is a tabulation of the series (2) for different values of 
m, n and p. 


Were it possible to obtain a simple formula for the sum of any assigned 
5 

number of terms of (2), the computation of such a table would be a rapid process, 

In the particular case p = 0 or n, such a formula has been given above. In the 
genera! case I have not reached any result+ and more widely trained mathematicians, 
who have kindly allowed me to consult them, do not regard the problem as a 
simple one. 

I therefore fell back upon the method of direct calculation. This is a straight- 
forward but irksome task t. 

* See Chrystal’s Algebra, Vol. 1. p. 392, Ed. 1889. 

+ Formulae are available in certain types of Hypergeometric Series. Vide M. J. M. Hill, “On a 
Formula for the Sum of a Finite Number of Terms of the Hypergeometric Series when the Fourth 
Element is equal to Unity,” Proc. Lond. Math. Soc. 1907, Series 2, Vol. v. p. 335; and 1908, Series 2, 
Vol. v1. p. 339. The methods of these papers cannot be used in the present case. 

+ Sir Ronald Ross and Mr W. Stott have recently published (‘‘ Tables of Statistical Error,” Annals 
of Tropical Medicine and Parasitology, Vol. v. No. 3, 1911) a set of tables for the use of laboratory 
workers. Their tables will be of great service in the cases in which p is not less say than 0-1, but are 
not, I think, available for the class of problem discussed in this paper, since they appear to be based 
on the “normal” theory of errors. It must be noticed that in an immense number of examples which 
arise in medical work p will not only be less than 0:1 but less than 0°01 (the prevalence of mental defect 
in children, albinism, epilepsy, ete. are instances), and for such cases the ‘‘ normal” treatment is, as 
pointed out above, inappropriate and often misleading. 
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For the benefit of those who wish to extend my small tables, it may be worth 
indicating the arithmetical arrangement which I have found most convenient. 
I use the following scheme: 


1 
n, ™m, P. q C, 
qtm 100 C, 
Term Multiplier 
= (p +1) 
. (a) (¢g+m)1 (2) 
m(p +1) (m=1)(p+2) gy 
(q +m) (8) (q+m—1)2 (°) 
Pee (m — 2) (p + 3) 
Bb (y) eitn-os (c) 
yc (8) : (d) 
™ . ; , 1 
The values of n, m, p,q and q+ are written at the top of the sheet, a and 


100 C, are calculated and written in the right-hand top corner. 


Two columns are next formed; the entries in the right-hand column having 
been made, any given term of the left-hand column is the product of the entries 
in the columns immediately above it. The entries in the left-hand column are 


added up and the sum checked by comparing it with Finally each term is 


1 
C,.” 
converted into a percentage by multiplying with 100 C,. 

In this way, provided one has a mechanical calculator, a series having only 
a moderately large number of arithmetically significant terms is rapidly evaluated. 
Still, when all is said, the calculation of a table for values of m and n ranging from 
say five to a hundred and 7 from 0 to ‘1 would need an heroic amount of patience. 
Even the present admittedly imperfect results have involved the expenditure of 
some little time and effort* and it was necessary to consider how best to utilise 
our limited resources. 


Having chiefly before my eyes the needs of laboratory workers, I felt sure that 
the cases of m and n not greater than 25 were of the most importance. Probably 
in the type of problem alluded to on p. 73 the “control” should be regarded as 
our n and it is usually possible to arrange the experiments in such a fashion that 
animals at least equal in number to those specially tested serve as the control. 
When it is possible to plan a large control, it is usually practicable to fix the 


* I desire heartily to thank my assistant, Mr J. W. Brown of the Lister Institute Statistical 
Department, to whose zealous co-operation in the arithmetical work I am greatly indebted. 


dense 
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number arbitrarily, so that it seemed sufficient to give in tabular form the results 
of small samples after first samples of 50 and 100 without calculating the inter- 
mediate cases. . 


The next question is as to whether, within the limited field chosen, interpolation 
can be trusted. Accurate methods of interpolation in the case of double-entry 
tables are a little complex* and not likely to appeal to the man in the laboratory. 
What is really material is whether simple interpolation is likely to lead to seriously 
erroneous conclusions. 


I now proceed to some tests. 


(1) A first sample of 17 having given 3 successes, required the probability 
that a second sample of 14 will contain 4 or more successes, 


From the tables for n= 20, p=3,m=10 and n=15, p=3, m= 10, we have 
for the proportional frequency of 4 or more successes in m trials, 
12°8062 
23°0040 
101978 
which gives by simple interpolation for n=17, p=3, m= 10, 
18'92488 (a). 


Similarly interpolating between the values for n= 20, p=3, m=15 and n=15, 

p=3, m=15, we have for n=17,'p=3, m=15, 
3892372 (8). 

Interpolating between (a) and (8) we reach for the proportional frequency of 
4 or more successes in 14 trials after n=17, p=3, 34°92. 

The true value obtained by direct calculation is 35°07601, which gives an error 
of -43°/, in the interpolated value, a difference of no importance for such 
purposes as the present. In the accompanying table I have grouped together the 
results of a number of random trials made in .different parts of the table. 
A perusal of these results leads, I think, to the following conclusions. 


(1) For values of m and m ranging in each case up to 25, interpolation, when 
necessary, gives results of sufficient exactitude for all the purposes likely to be 
served by such tables. : 


(2) For greater values of m and n, particularly when the latter is greater than 
50, the differences are too great to allow of interpolation and for such values the 
table can only provide the reader with a general impression (which is often enough 
sufficient) as to the limits within which possible variations from the proportions 


* Vide W. Palin Elderton, ‘‘Interpolation by Finite Differences (Two Independent Variables),” 
Biometrika, 1903, Vol. 11. p. 105; W. Palin Elderton, “Some Notes on Interpolation in n-dimension 
Space,” ibid. 1908, Vol. v1. p. 94; also the Introduction to the British Association Tables of F (r, v) and 
H (r, v) Functions (issued by B. A. 1899, p. 56). 
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found in the first sample are likely to fall. As remarked above, it was thought 
that when the first sample exceeded fifty its numerical composition would often be 
at the choice of the worker. I think, therefore, that these tables are likely to 
serve most of the objects I had in mind when the work was undertaken, although 
it is much to be desired that someone will have leisure considerably to extend 
them. I do not see any immediate prospect of being that person. 


The class of problem in which this species of investigation seems desirable has 
already been described and the reader is perhaps not anxious to see any more 
arithmetical examples. I may, however, give a single concrete instance of the 
kind of research in which, I hope, the tables will be of value. 


Tests of the Accuracy of Simple Interpolation. 


[The True Values are given in brackets, ] 


Example I. »=22, p=4, m=16. 
0—3 Successes 1—3 Successes 3 Successes 
57°38 (57°49) 51°27 (51°65) 18°70 (19°14) 

Example II. »=37, p=4, m=22. 
0—3 Successes 1—2 Successes 3 Successes 
62°85 (67°89) 36°47 (39°83) 16°64 (18°88) 


Example III. n=71, p=4, m=43. 
0—2 Successes 1—3 Successes 3 Successes 
46°43 (47°53) 50°60 (56°66) 15°21 (18°25) 
Example IV. x=100, p=4, m=39. 
0—3 Successes 1—2 Successes 3 Successes 
83°84 (84°93) 48°59 (50°78) 13°78 (15°15) 


In a paper by Rous*, several experiments of the following kind are detailed. 


15 mice+ were injected intraperitoneally with a suspension of mouse embryo in 
normal saline and 11 days later reinjected with the same substance. Ten days 
after the second injection they were inoculated subcutaneously with a mass made 
from mouse embryos 1°5 cm. long, and 17 previously untreated mice were inoculated 
at the same time to serve as a control. 


In only one of the 17 control mice was no graft found at the autopsy, but 8 of 


the treated mice did not “take.” If we wish to know whether this difference 
be an effect of the intraperitoneal inoculations, we may put n=17, p=1, and 
ascertain the chance that in m=15 there would be 8 or more successes. 


From the tables, with interpolation, I find the odds against such a result to be 
about 260 to 1. In other words it is very likely that the treatment has led to the 
* «An Experimental Comparison of Transplanted Tumour and a Transplanted Normal Tissue 


Capable of Growth,” by Peyton Rous, M.D., Journ. Experimental Medicine, 1910, Vol. x11. p. 344. 
+ I take the number stated in the text but can only identify 14 in the corresponding table. 
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observed result. It may be remarked, however, that had we used a normal curve 


- 1x16 
with s.D. ie ae 15 Gay ¥? the odds estimated therefrom would have been enormously 
greater. 


In conclusion, I desire to refer to a subject indirectly related to the topic of this 
paper and, I think, of importance. We are familiar with such arrangements of 
material as the following. n, persons have been immunised against a certain 
disease and, having contracted the disease, a, have died. Of n, not immunised 

- 4 (@ ; 
a, have died (< *), The extent of protection conferred is then estimated by 


2 
some coefficient of correlation or association. The trustworthiness of the coefficient 
so calculated is then measured by a comparison between its arithmetical value and 
that of its standard deviation or “ probable error.” This process has its limitations. 
If, for instance, either a, or a, be zero, Yule’s coefficients of association and colli- 
gation become unity and their standard deviations are indeterminate. 


I would put forward for consideration the possibility that the use of Bayes’ 
theorem might here be of value. Thus: 


Let the chance of a, or more successes in n, after a, successes in m, be p, and 
the chance of a, or less successes in n, trials after a, successes in n, trials be p,. 
Then, since either 7, or n, might have been drawn first, a measure of the probability 


of the observed result will be 2 =P. 


We might indeed adopt a scale of reliability by putting P=f (BS i + Pr) =, the 


function being such that P increases to unity as PtP: diminishes to zero. 

I put forward these suggestions with some doubt, but I cannot help feeling 
sure that in such cases as those I have instanced the ordinary method of testing 
the reliability of a coefficient of association is a dangerous and possibly misleading 
artifice*. 

In conclusion I desire to express my regret that this paper is so imperfect. 
The problems treated require mathematical abilities and training not at my 
command. I have only ventured to write upon the subject because of its practical 
importance and may, perhaps, venture to entertain the hope that my numerous 
mistakes of omission and commission will be leniently treated. : 


* [The probability corresponding to the x? of the fourfold table can be calculated straight away ; but 


the difficulty arises from our not mentally appreciating grades of probability with the readiness we 
appreciate grades of correlation on a limited scale. Eprror.] 
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TABLE, Percentage Frequency of Successes in a Second Sample “m” after 
drawing “p” Successes in a First Sample 


Successes 
n=6) O 
m=5\) 1 
2 
4 
v0 
n=6| 0 
m=6{ 1 
2 
3 
4 
5 
6 
n=7\ O 
m=5{ 1 
2 
3 
4 
5 
n=7\ O 
m=6{ 1 
2 
3 
4 
6 
6 
n=7) O 
m=7{ 1 
2 
3 
4 
5 
6 
a 
n=8) O 
m=5( 1 
2 
3 
4 
5 
n=8) O 
m=6\ 1 
2 
3 
4 
5 
6 
n=8| O 
m=7{ 1 
2 
3 
4 
5 
6 
7 
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p=6 
58°3333 
26°5151 
10°6060 
3°5354 
“8838 
+1263 


53°8462 
26°9231 
12°2378 
4°8951 
1°6317 
“4079 
0582 


61°5385 
25°6410 
9°3240 
2°7972 
6216 
0777 


57°1429 
26°3736 
10°9890 
3°9960 
1°1988 
"2664 
0333 


53°3333 
26°6667 
12°3077 
5°1282 
1°8648 
"5594 
*1243 
0155 


64°2857 
24°7253 
8°2418 
2°2478 
"4495 
0499 


60-0000 
25°7143 
9°8901 
3°2967 
*8991 
1798 
“0200 


56°2500 
26°2500 
11°2500 
4°3269 
1°4423 
"3934 
0787 


0087 


y=1 
31°8182 
31°8182 
21°2121 
10°6060 
3°7879 
‘7576 


26°9231 
29°3706 
22°0280 
13°0536 
6°1189 
2°0979 
4079 


35°8974 
32°6340 
19°5804 
8°7024 
2°7195 
4662 


30°7692 
30°7692 
20°9790 
11°1888 
4°6620 
1°3986 
2331 


26°6667 
28°7179 
21°5385 
13°0536 
6°5268 
2°6107 
‘7615 
"1243 


39°5604 
32°9670 
17-9820 
7°1928 
1‘9980 
*2997 


34°2857 
31°6484 
19°7802 
9°5904 
3°5964 
"9590 
*1399 


30°0000 
30-0000 
20°7692 
11°5385 
5°2448 
1°8881 
“4895 
0699 








p=2 
15:9091 
26°5151 
26°5151 
18°9394 

9°4697 

2°6515 


12°2378 
22°0280 
24°4755 
20°3962 
13°1119 

6°1189 

1°6317 


19°5804 
29°3706 
26°1072 
16°3170 
6°9930 
1°6317 


15°3846 
25°1748 
25°1748 
18°6480 
10°4895 
4°1958 
"9324 


12°3077 
21°5385 
23°4965 
19°5804 
13°0536 
6°8531 
2°6107 
5594 


23°0769 
31°4685 
25°1748 
13°9860 
5°2448 
1°0489 


18°4615 
27 °6923 
25°1748 
16°7832 
8°3916 
2°9370 
5594 


15-0000 
24-2308 
24°2308 
18°3566 
11°0140 
5°1399 
1°7132 
*3147 


25°2525 
25°2525 
17°6768 

7°0707 


4°8951 
13°0536 
20°3963 
23°3100 
20°3963 
13°0536 

4°8951 


9°7902 
21°7560 
27°1950 
23°3100 
13°5975 
4°3512 


6°9930 
16°7832 
23°3100 
23°3100 
17°4825 
9°3240 
2°7972 


5°1282 
13°0536 
19°5804 
21°7560 
19°0365 
13°0536 
6°5268 
1°8648 


12°5874 
25°1748 
27°9720 
20°9790 
10°4895 

2°7972 


9°2308 
20°1398 
25°1748 
22°3776 
14°6853 
6°7133 
1°6783 


69231 
16°1538 
220280 
220280 
17°1329 
10°2797 

4°4056 

1°0489 








“ ” 


n . 


p=4 


6°2937 
17°4825 
26°2238 
26°2238 
17°4825 
6°2937 


4°1958 
12°5874 
20°9790 
24°4755 
20°9790 
12°5874 

4°1958 


2°8846 
9°1783 
16°5210 
21°4161 
21°4161 
16°5210 
9°1783 
2°8846 





p=5 











Successes 


n=8) 0 
m=8{ 1 


Mites %& Oe OS Co ED Tis Co % 


n=9 
m=6 


D Aw Co We OS 


n=9| 
m= 7 | 


n=9 
m=8\ 


n=9 
m=9 


— — 
© HS WA N* MHS WONVAMS WDNR DS VAS SONS 


Tie & Os S 


p=0 
52°9412 
26°4706 
12°3529 
52941 
2°0362 
‘6787 
"1851 
‘0370 
0041 


666667 
23°8095 
7°3260 
1°8315 
*3330 
0333 


62°5000 
250000 
8°9286 
2°7472 
6868 
"1249 
0125 


58°8235 
25°7353 
10°2941 
3°6765 
1°1312 
"2828 
0514 
0051 


55°5555 
26°1438 
11°4379 
4°5752 
1°6340 
5027 
*1257 
0229 
0023 


52°6316 
26°3158 
12°3839 
5°4180 
2°1672 
"7740 
*2381 
0595 
0108 
“0011 


54°5454 
27°2727 
12°1212 
4°5454 
12987 
2165 





M. GREENWOOD 


TABLE—(continued). 


p=1 
26°4706 
28°2353 
21°1765 
13°0317 
6°7873 
2°9617 
1:0366 
+2633 
‘0370 


42°8571 
32°9670 
16°4835 
5°9940 
1°4985 
“1998 


37°5000 
32°1429 
18°5439 
8°2418 
2°8097 
6743 
0874 


33°0882 
30°8824 
19°8529 
10°1810 
4°2421 
1°3883 
*3239 
0411 


29°4118 
29°4118 
20°5882 
11°7647 
5°6561 
2°2624 
*7199 
1645 
0206 


26°3158 
27 °8638 
20°83978 
13°0031 
6°9659 
3°2151 
1°2503 
*3897 
0877 
0108 


27°2727 
30°3030 
22°7273 
12°9870 
5°4112 
1°2987 





p=2 
12°3529 
21-1765 
228054 
190045 
12-9576 
7°2563 
3°2250 
1:0366 
‘1851 


26°3736 
32°9670 
23 °9760 
11-9880 
3°9960 
6993 


21°4286 
29°6703 
24°7253 
14°9850 
6°7433 
2°0979 
3496 


17°6471 
26°4706 
24°4344 
16°9683 
9°2554 
3°8873 
1°1518 
1851 


14-7059 
23°5294 
23°5294 
18°0995 
11°3122 
5°7589 
2°3036 
“6582 
"1028 


12°3839 
20°8978 
22°2910 
18°5759 
12°8602 
7°5018 
3°6372 
1°4029 
3897 
0595 


12°1212 
22°7273 
25°9740 
21°6450 
12°9870 

4°5454 


p=3 
52941 
13-0317 
19-0045 
20°7322 
18°1407 
129000 
7°2563 
2°9617 
‘6787 


15°3846 
27°9720 
27 °9720 
18°6480 
8°1585 
1°8648 


11°5385 
23-0769 
26°2238 
20°9790 
12-2378 

4°8951 

1°0489 


8°8235 
19°0045 
23°7557 
21°5961 
15°1172 
8°0625 
3°0234 
“6170 


6°8627 
15°6863 
21°1161 
21°1161 
16°7969 
10°7500 
5°3750 
1°9197 
“3771 


5°4180 
13°0031 
18°5759 
20°0047 
17°5042 
12°7303 
7°6382 
3°6372 
1°2503 
2381 


4°5454 
12°9870 
21°6450 
25°9740 
22°7273 
12°1212 


p=4 
2-0362 
6°7873 
12-9576 
18-1407 
20°1563 
18-1407 
12-9576 
6°7873 
2-0362 


8°3916 
20°9790 
27 ‘9720 
24°4755 
13°9860 
4°1958 


5°7692 
15°7343 
23°6014 
24°4755 
18°3566 
9°4405 
2°6224 


4°0724 
11°8778 
19°4364 
22°6759 
20°1563 
13°6055 
6°4788 
1°6968 


2°9412 
9°0498 
15°8371 
20°1563 
20°1563 
16°1250 
10°0782 
4°5249 
171312 


2°1672 
6°9659 
12°8602 
17°5042 
19-0955 
17°1859 
12°7303 
7°5018 
3°2150 
“7740 












































On Errors of Random Sampling 


Successes p=0 p=1 p=2 p=3 p=4 p=5 p=6 
n=10| O 68°7500 45°8333 29°4643 18°1318 10°5769 5-°7692 


m= 5{ 1 22°9167 32°7381 33°9972 30°2198 24°0385 17°3077 
2 65476 15°1099 22°6648 27°4725 28°8461 26-9230 
38 1°5110 5°0366 10°3022 16°4835 22°4259 26°9230 
4 ‘2518 171447 3°0907 6°4103 11°2179 17°3077 
6 0229 1373 ‘4807 1°2820 2°8846 5°7692 
n=10| 0 52°3809 26°1905 12°4060 5°5138 2°2704 *8514 
m=10{ 1 26°1905 27°5689 20°6767 12°9736 7:0949  3:4056 
2 12°4060 20°6767 21°8930 18°2441 12°7709 7°6625 
8 5°'5138 12°9736 18°2441 19°4604 17°0278 12°5744 
4 2°2704 7°0949 12°7709 17°0278 18°3377 16°5039 
5 8514 3°4056 7°6625 12°5744 16°5039 18°0043 
6 2838 1°4190 3°9295 7°8590 12°5030 16°5039 
7 0811 ‘4990 1°6841 4°0826 7°8590 12°5744 
8 0187 "1403 5741 1°6841 3°9295 7°6625 
9 0031 0284 "1403 ‘4990 1°4190 3°4056 
10 “0003 0031 0187 0811 2838 *8514 
m=15| O 76°1905 57°1429 42°1053 30°4094 21°4654 14°7575 9°8383 
m= 5{ 1 19°0476 30:0752 35°0877 35°7757 33°5397 29°5149 24-5958 
2 4:0100 10°0251 16°5119 22°3598 26°8318 29°5149 30°2717 
3 6683 2°3588 5°1599 8°9439 13°4159 18°1631 22°7038 
4 0786 *3685 1°0320 2°2360 4°1280 6°8111 10°3199 
5 0049 0295 "1032 2752 6192. -1°2384 =. 22704 
m=15| O 61°5384 36°9231 21°5385 12°1739 6°6403 3:4783 1°7391 
m=10{ 1 24°6154 30°7692 28°0936 22°1344 15°8103 10°43848 6:4073 
2 9°2308 18°0602 22°9857 23°7154 21°3439 17°2997 12°8146 
3 32107 8°7565 14°5941 18°9723 20°9694 20°5034 18°0913 
4 1°0216 3°6485 7°6619 12°2322 16°3095 18°9958 19-7873 
5 ‘2919 1°3135 3°3874 6°5238 10°3613 14°2469 17°4128 
6 ‘0730 4032 «=1°2546 2°8781 5°3965  8°7064 12°4377 
7 0153 “1024 *3795 =61°0279 §= 22614 42644 = 71078 


8 0025 “0203 “0889 2827 ‘7269 + 1°5991 3:°1094 
9 “0003 0028 0145 0538 1615 “4146 "9423 
10 “0000 “0002 “0012 0054 0189 0565 “1508 


O 51°6129 25°8065 12°4583 5°7842 2°5707 1:0876 "4351 
1 25°8065 26°6963 20°0222 12°8538 7°4156 3:°9155 1:9033 
2 12°4583 20°0222 20°7638 17°3032 12°4583 7:9941  4:6342 
3 5°7842 12°8538 17°3032 17°9953 15°7459 12°0490 98-2152 
4 25707 7°4156 12°4583 15°7459 16°4305 14°7874 11°7361 
5 1°0876 3°9155 7:9941 12°0490 14°7874 15°4916 14-2006 
6 4351 = 1°9033 4°6342 82152 11°7361 14:2006 14:9480 
7 ‘1631 8512 =2°4375 «=5°0297 «82991 11°5313 13°8803 
8 0567 3482 =1°1607 2°7663 5:2415 8:3282 11°4309 
9 0181 *1289 ‘4965 1°3589 2°9443 5°3344 8°3350 
10 "0052 0426 1882 5889 1°4548 3°0006 5°3344 
11 0013 0122 0618 2204 6200 + 1°4548 2°9443 
12 0003 0029 “0169 0689 "2204 5889 1°3589 
18 “0001 “0006 0037 *0170 0618 "1882 4965 
14 “0000 “0001 “0006 “0029 0122 0426 "1290 
15 “0000 “0000 “0000 “0003 0013 0052 0181 





” 


Percentage Frequencies of Successes in a Second Sample “m”. 


6°3246 
19°4604 
29°1906 
26°5369 
14°5953 
3°8921 


*8238 
3°6613 
8°7226 
14°5376 
18°6566 
19°1897 
15°9914 
10°6609 

5°4516 

1°9383 

“3661 


"1631 
*8512 
2°4375 
5°0297 
8°2991 
11°5313 
13°8803 
14°6968 
13°7783 
11°4309 
8°3282 
5°2415 
2°7664 
1°1607 
"3482 
0567 





Danae 








Successes 


n=20 
n= st 


Me & ®@HO 


n=20) 
m=10{ 


SNA AW MHS 


© 
co 


~ 
~ 


WHOND Nw & WHOS 


n=20| 0 
m=20{ 1 


p=0 
80°7692 
16°1538 
2°6923 
*3512 
0319 
‘0015 


67°7419 
22°5806 
7°0078 
2°0022 
5191 
"1198 
0240 
0040 
0005 
“0000 
“0000 


58°3333 
25°0000 
10°2941 
4°0553 
1°5207 
5396 
“1799 
0558 
0159 
0041 
0010 
“0002 
“0000 
“0000 
“0000 
“0000 


51°2195 
25°6098 
12°4765 
59099 
2°7154 
1°2068 
“5172 
0839 


M. GREENWOOD 


TABLE—-~(continued). 


p=1 


64°6154 
26°9231 
7°0234 
1°2770 
"1520 
“0091 


45°1613 
31°1457 
150167 
5°9325 
1°9965 
5750 
1398 
0278 
0043 
“0004 
“0000 


25-6098 
26 °2664 
19°6998 
12°7783 
7°5427 
4°1377 
2°1297 
1°0326 
‘4719 
*2030 
“0819 
“0308 
‘0107 
0034 
“0010 
“0003 


p=2 


51°1538 
33°3612 
12°1313 
2°8884 
“4333 
0319 


29°5884 
31°7019 
21°1346 
10°8382 
4°5521 
1°5932 
4618 
1079 
0193 
0024 
“0002 


18°6275 
25°4011 
22°2259 
15°5343 
9°3206 
4°9495 
2°3569 
1:0101 
3885 
‘1330 
0399 
“0102 
0022 
0004 
“0000 
“0000 


12°4765 
19°6998 
20°2323 
16°8602 
12°2839 
80929 
4°9048 
2°7589 
1°4462 
‘7070 
"3218 


p=3 


40°0334 
36°3940 
17°3305 
5°1992 
*9577 
0851 


19°0211 
28°1795 
24°3861 
15°6071 
7°9661 
3°3250 
1°1335 
3084 
0636 
0089 
0006 


10°1604 
19°0508 
21°5090 
18°6411 
13°4987 
8°4849 
4°7138 
2°3310 
1°0256 


59099 
12°7783 
16°8602 
17°3419 
15°1742 
11°7715 

8°2768 

5°3399 
3°1817 
1°7554 
"8965 
"4226 
"1829 
0720 
"0255 
“0080 
22 
0005 
0001 
0000 


p=4 


30°9349 
36°8273 
22°0964 
8°1408 
1°8090 
1915 


11°9763 
23°0313 
24°8738 
19-3463 
11-7760 
5°7809 
2°2940 
°7210 
*1707 
0274 
0023 


5°3977 
13°0590 
18°2826 
19°1232 
16°3913 
12°0203 
7°7053 
4°855C 
2°1795 
$581 
+3658 
“1188 
‘0317 
“0065 
“0009 
0001 


2°7154 
7°5427 
12°2839 
15°1742 
15°6340 
14°0706 
11°3473 
8°3213 
5°5954 
3°4638 
1°9757 
1°0362 
“4974 
2168 








p=5 
23°5695 
35°3542 
26°0505 
11°5780 
3-0647 
3831 


7°3700 
17°6880 
23°2155 
21°5333 
15°4158 
8°8091 
4°0375 
1°4571 
“3946 
0722 
“0068 


2°7859 

8°3578 
14°1218 
17°4841 
17°4841 
14-7942 
10°8491 


11°7715 
14°0706 
14°5245 
13°3141 
11-0186 
8°3131 
5°7473 
3°6474 
2°1221 
171274 
5429 
"2345 
0893 
0293 
‘0080 
0017 
“0003 
“0000 




































igs 


m= § 
n=25) 


m=10{ 


n=25| 
m=15{ 


n=25) 
m= 20 | 


Successes 


Mires Co oe O 


N®A At’ CORO 


— 


ED Tie Co tT & 


p=0 


83°8710 
13°9785 
1°9281 
2065 
0153 
0006 


72°2222 
20°6349 
5°4622 
1°3242 
2897 
0561 
0093 
0013 
“0001 
“0000 
“0000 


63°4146 
23°7805 
8°5366 
2°9204 
"9472 
"2894 
0827 
“0218 
0053 
0012 
“0002 
“0000 
“0000 
*V000 


56°5217 
25°1208 
10°8476 
4°5409 
1°8380 
“7172 
2690 
0965 
0330 
0107 
0033 
0009 
“0002 
0001 
“0000 


On Errors of Random Sampling 


TABLE—(continued). 


Pp =1 
69°8925 
24°1008 

5°1645 

"7651 
0736 
0035 


51°5873 
30°3455 
12°4141 
4°1380 
1°1680 
"2803 
"0564 
“0092 
0011 
“0001 
“0000 


39°6341 
30°4878 
16°8485 
7°8930 
3°2888 
1°2403 
4256 


31°4010 
28°5463 
18°9202 
10°8116 
5°6036 
2°6897 
1°2069 
“5082 
*2009 
“0744 
*0257 
“0082 
“0024 
“0006 
“0002 


p=2 


57°8421 
30°9868 
9°1813 
1°7656 
2119 
0123 


36°4146 
33°1041 
18°6211 
80091 
2°8032 
*8119 
1933 
“0368 
0053 
0005 
“0000 


24°3902 
28°8832 
21°8575 
13°1550 
6°7654 
3°0643 
1°2381 
4477 
"1444 
0412 
“0102 
"0022 
“0004 


p=3 


47°5131 
35°1949 
13°5365 
3°2488 
"4738 
0329 


25 °3798 
31°7248 
23°0261 
12°2806 
5°1875 
17786 
*4940 
“1086 
‘0179 
"0020 
“0001 


14°7625 
23°9392 
23°2742 
17°2894 
10°6788 
5°6953 
2°6697 
1°1072 
“4060 
*1307 
0364 
0086 
0016 
“0002 
“0000 
“0000 


9°1614 
17°4502 
20°2168 
18°1951 
13°8796 
9°3504 
5°6861 
3°1589 
1°6133 
"7592 
"3290 
1308 
0475 
“0156 
0046 
0012 
"0002 


p=4 


38°7144 
37°2254 
17°8682 
5°2115 
9064 
0741 


17°4486 
28°1430 
25°3287 
16°3035 
8°15i6 
3°2607 
1°0451 
*2628 
0493 
0062 
“0004 


8°7777 
18°2869 
21-9443 
19°5777 
14°2383 
8°8100 
4°7366 
2-2329 
9240 
3337 
1038 
0272 
0058 
0009 
0001 
0000 


4°7988 
11°7044 
16°6788 
17°9618 
160711 
12°5094 
8°6871 
5°4604 
3°1317 
1°6449 
“7916 
3482 
1393 
“0502 
0162 
0045 
0011 
0002 
“0000 


p=5 


31°2693 
37°5232 
21°8885 
7°6134 
1°5573 
"1483 


11°8200 
23°6401 
25°6780 
19°5642 
11°4125 
5°2673 
1°9313 
5518 
"1170 
0165 
“0012 


5°1203 
13°1666 
18°9753 
19°9337 
16°8191 
11°9361 
7°2943 
3°8807 
1°8017 
"7266 


16°4186 
14°5943 
11°4669 
8°0943 
5°1816 
3°0226 





Successes 


n=25| 0 
m= 25 f 





n=50| O 
n= 5f 1 


Cit So % 


n=50) 
m=10( 


SWAHNVAN* SMHS 


— 


n=50 
maibt 


13, 14, 15 





p=0 


50°9804 
25°4902 
12°4850 
5°9824 
2°8003 
1°2784 
“5682 


91°0714 
8°2792 
"6133 
0347 
0013 
“0000 


83°6065 
13°9344 
2°1256 
2932 
0360 
0039 
“0003 
"0000 
“0000 
“0000 
‘0000 
77°2727 
17°8322 
3°9008 


M. GREENWOOD 


TABLE—(continued). 

p= i p= 2 p= 3 
254902 12°4850 5°9824 
26°0104 19°5078 12°7285 
195078 19°9229 16°6024 
12°7285 16°6024 16-9713 
7°6094 12°1751 14°8499 
4°2613 8°1352 11°6037 
2°2598 5°0451 82883 
1°1411 2°9344 5°4870 
“5502 1°6103 3°3951 
*2535 *8365 1°9732 
“1115 “4118 1°0801 
0468 1921 *5573 
0187 “0848 ‘2709 
0070 0353 *1238 
0025 ‘0138 0531 
“0008 “0051 0212 
*0003 ‘0017 0079 
0001 “0005 *0027 
— “0002 “0008 
— “0000 “0002 
= _- “0001 
82°7922 75°1263 68-0389 
15°3319 21°2621 26°1688 
1°7357 3°2711 5°1311 
"1335 *3207 6157 
“0065 0192 “0440 
“0001 “0005 “0014 
69°6721 57°8633 47°8869 
23°6177 29°9293 33°6048 
5°4972 9°4513 13°5019 
1°0287 2°2503 3°9278 
“1607 *4296 “8910 
“0210 ‘0668 “1614 
0023 “0084 “0233 
0002 0008 0026 
“0000 “0001 “0002 
“0000 “0000 “0000 
59°4406 45°5092 34°6737 
27 °8628 32°5066 33°5552 
9°2876 14°6804 19:2530 
2°5965 5°2143 8°3429 
‘6385 1°5643 29695 
1405 *4083 ‘9011 
0278 “0939 2371 
“0049 ‘0191 "0544 
“0008 *0034 “0109 
0001 “0005 “0019 
— “0001 “0003 


p=4 


2°8003 
7°6094 
12°1751 
14°8499 
15°1953 
13°6757 
11°1185 
8°2990 
5°7456 
3°7128 
2°2477 
1°2771 
"6811 
“3406 
"1592 
0693 
0280 
0104 
0035 
“0010 
0003 
0001 


61°4967 
30°1454 
"2349 
0335 
0861 
0033 


esl 


39°4857 
35°2551 
17°3070 


26°2849 
32°3175 
22°6222 
11°6306 
4°8127 
1°6718 
“4976 
*1279 
0284 
0054 
0009 
“0001 





p=d 


1°2784 
4°2613 
8°1352 
11°6037 
13°6757 
14-0093 
12°8418 
10°7251 
8°2555 
59003 
3°9335 
2°4521 
1°4304 
*7802 
3971 
"1879 
0822 
0330 
-0120 
*0039 
“0011 
0003 
“0001 


55°4676 
33°2805 
9°5087 
1°5848 
“1517 
“0066 


32°4346 
35°3833 
20°6402 
8°3080 
2°5164 
5921 
1085 
0152 
“0015 
“0001 
“0000 


19°8214 
29°7321 
24°6927 
14°7589 
6°9910 
2°7465 
“9155 
"2616 
0642 
0132 
0024 
“0003 
“0000 








Successes 


n=50| 0 
m=20{ 1 
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n=50) O 
m=25{ 1 


© SVD Tie Ss % 


18—25 


n=50| 0 
m=50( 1 


~ 
SD WNH Aw  & 





p=0 

71°8310 
20°5231 
5°6513 
1°4959 
“3796 
“0920 
0212 
“0046 
‘0010 
“0002 


67°1053 
22°3684 
7°2546 
2°2857 
6984 
2066 
0590 
0163 
0043 
0011 
0003 
0001 


50°4950 
25°2475 
12°4963 
6°1206 
2°9657 
1°4210 
6731 
"3151 
*1457 
0665 
“0300 
0133 
0058 
0025 
“0011 
“0004 
0002 
0001 
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TABLE—(continued). 


Pp =1 
51°3078 
29°7437 
12°4661 

4°4655 
1°4377 
"4247 
1161 
0295 
0070 
0015 
0003 
0001 


44°7368 
30°2276 
14-9068 
6°3492 
2°4592 
8853 
2994 
0956 
"0289 
"0083 
0022 
0006 
‘0001 
“0000 


25°2475 
255026 
19°1269 
12°6198 
7°7231 
4°4875 
2°5063 
1°3552 
“7126 
3654 
1831 
“0898 


p=2 
36°4360 
32°1494 
18°2340 
8°2882 
3°2515 
1°1380 
“3613 
“1049 
“0279 
‘0068 
“0015 
“0003 
“0001 


29°6230 
30°4346 
20°2898 
10°9546 
5°1643 
2°2004 
*8629 
“3146 
1073 
0343 
0103 
0029 
0008 
0002 


12°4963 
19°1269 
19°3241 
16°1034 
11°9504 
8°1873 
5°2821 
3°2480 
1°9185 
1°0942 
“6049 
"3250 
1699 
0866 


p=3 
25°7195 
30°7099 
22°1018 
12°2410 
5°6902 
2°3122 
‘8391 
‘2751 
"0820 
“0222 
“0055 
0012 
“0002 
“0000 


19°4782 
27°0530 
22°8617 
15°0234 
8°3826 
4°1420 
1°8546 
“7627 


6°1206 
12°6198 
16°1034 
16°2729 
14°2388 
11-2686 
8°2677 
5°7108 
3°7517 
2°3606 
1°4298 
8367 
4743 
2610 
1396 
0726 
“0368 
0182 
0088 
0041 
0019 
“0008 
“0004 
0002 
“0001 
“0000 


p=4 
180421 
27°3365 
239720 
15°7316 
8°4901 
39438 
1°6163 
“5926 
“1959 
‘0585 
0158 
0039 
-0008 
0002 
-0000 


12°7149 
22°3854 
23°0250 
17°9083 
11°5877 
6°5376 
3°3018 
1°5167 
6398 
"2494 


p=5 
12°5748 
23°2150 
24°1218 
18-3785 
11°3384 
59480 
2°7262 
1°1089 
4039 
+1323 
"0390 
0103 
0024 
0005 
0001 
0000 


8°2378 
17°6525 
21°4900 
19°3831 
14°3204 
9°1130 
5°1406 
2°6162 
1°2147 
5181 


Succ 
n=l 
n= 


n= 
n= 


m= 











ee 


a hI ln 


— 





ee 





Successes p=0 p=1 
n=100) 0 90°9910 82-7191 
n= rf 1 8°2719 15°1778 
2 *6830 1°8972 
3 0506 “1891 
4 0033 “0156 
5 0002 0011 
6 “0000 0001 
7 “0000 
8 = me 
9 oe ees 
10 — — 
p=10 p=15 
0 33°5855 19°6056 
1 36°9441  33°0200 
2 20°1513 26°8727 
38 71284 13°8698 
4 =1°8005 5°0127 
5 *3376 1°3220 
6 0474 *2571 
7 0049 0363 
8 “0003 0036 
9 “0000 “0002 
10 — “0000 
Successes p=0 p=1 
n=100| O 95°2830 90°7457 
m= 5{ 1 4°5373 8°7256 
2 1745 5083 
3 “0051 0199 
4 “0001 0005 
5 “0000 0000 
n=100) 0 87°0690 75°7121 
m= 15{ i 11°3568 19°9243 
2 1°3947 3°7027 
8 1604 *5730 
4 0172 0774 
6 0017 0093 
6 0002 “0010 
7 “0000 “0001 
8 — 0000 
9 = BEX 
10 — — 
11-15 — oe 
n=100| 0 83°4711 69°5592 
m= eof 1 13°9119 23°3813 
2 2°2212 5°6472 
8 "3388 1°1584 
4 "0492 *2122 
5 0068 0354 
6 “0009 “0054 
7 “0001 “0008 
& “0000 “0001 
9 — “0000 
10 —— -- 
11 — -= 
12 ~~ —- 
18—20 — — 
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TABLE—(continued). 


p=2 p=3 p=4 p=5 p=6 
75°1302 §=68°1737 ~=—s 6 1°8023. »-555°9719 §=—-50°6 412 
20°8695 25°4855 29°1520 31°9839 34-0855 
3°5107 5°4097 74962 9°6874 11°9134 
4416 8243 1°3455 2°0065 2°8031 
"0442 0971 "1829 “3098 *4857 
“0036 -0090 “0194 0368 0641 
“0002 0007 0016 “0034 0065 
“0000 “0000 0001 “0002 “0005 
p=20 p25 p=30 p=3d p=40 
11°0992 6°0712 3°1945 1°6083 “7697 
25°8982 18°5708 12°3788 7°7198 4°5082 
28°8081 26°8613 22°5639 17°3696 12°3485 
20°0784 24°1644 25°4567 24°1112 20°8229 
9°6930 14°9554 19°6711 22°8554  23°9308 
3°3813 6°6468 10°8709 15°4516 19°5797 
*8619 2°1464 4°3483 7°5418 11°5470 
"1583 “4968 1°2424 2°6232 4°8456 
“0200 0788 "2425 6221 1°3845 
0015 0077 “0292 “0908 "2432 
“0001 "0004 0017 0062 0199 
86°3830 82°1896 78°1607 74°2914 70°5768 67°0123 
12°5800 16°1156 19°3467 22°2874 24°9514 27°3520 
‘9867 1°5956 2°3216 3°1517 4°0737 5:0756 
“0488 0957 "1642 2573 “3780 *5287 
“0015 “0034 0067 0119 ‘0197 0306 
“0000 “0001 “0001 “0003 0004 “0008 
65°7500 57°0221 49°3852 42°7116 36°8873 31-8110 
26°1836 30°5476 33°3684 34°9458 35°5336 35°3456 
675459 9°6321 12°7407 15°7096 18°4248 20°8110 
1°2777 =. 2°2767 «= 35456 3=—-5 0426 =— «67156 = 85076 
2091 "4386 ‘7879 =. 1°2724 +=61°9006 = - 26738 
0295 0715 "1458 2641 4381 6787 
0037 0100 0229 0461 0842 "1428 
0004 0012 0031 “0068 0137 "0252 
“0000 “0001 “0004 “0009 0019 0037 
—_ “0000 “0000 0001 0002 0005 
—_— _ — “0000 “0000 0001 
— — — _— — 0000 
57°8686 48°0604 39°8449 32°9751 27°2403 22°4613 
29°4247 32°8618 34°3491 34°4088 33°4530 31°8036 
9°5567 13°4563 17°0252 20°0718 22°4994 24°2787 
2°4716 4°2124 6°2724 8°5261 10°8479 13°1236 
5480 1°0993 1°8873 2°9118 4°1535 5°5775 
"1077 *2490 *4853 8394 1°3291 1°9649 
0191 0500 “1093 2099 "3658 5913 
0031 “0090 0219 0462 “0881 "1547 
“0004 “0015 “0039 “0090 0187 0356 
“0001 “0002 “0006 0016 “0035 0072 
“0000 “0000 “0001 “0002 “0006 0013 
— _— “0000 “0000 “0001 “0002 
— —_ _— — “0000 “0000 


p=7 
45°7719 
35°5510 
14°1158 
3°7269 
7174 
1044 
0115 
0010 
0001 
0000 


p=45 


3473 
2°4580 
8°1229 

16°5036 
22°8255 
22°4513 
15-9030 
8°0093 
2°7446 
5778 
0567 


p=8 
63-5933 
29°5020 
6°1463 
“7117 
0454 
‘0013 


27°3928 
34°5611 
22°8233 
10°3611 

3°5865 


18°4859 
29°7094 
25°4270 
15°2562 
7°1382 
2°7495 
*8994 


p= 8 
41°3280 
36°4659 
16°2472 

4°7658 

1°0109 


"1463 
1°2434 
4°9313 
12°0167 
19°9224 
23°4799 
19°9224 
12-0167 

4°9313 

1°2434 

"1463 


p=9 
60°3153 
31-4142 
7°2749 
“9287 
0649 
-0020 


23°5527 
33°3293 
24°4415 
12-2207 
4°6273 
1°3973 
3459 
‘0711 
0122 
0017 
“0002 
0000 


15°1848 
27-3600 
25-9920 
17-1690 
8°7832 
3°6775 
1°3010 
“3965 
‘1053 
“0245 
“0050 
“0009 
0001 
“0000 
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p=10 
57°1739 
33°1007 
8°4512 
1°1813 
“0899 
“0030 


20°2198 
31°7739 
25°6636 
14-0361 
5°7796 
1°8884 
*5036 
1112 
0204 
0031 
0004 
“0000 


12°4488 
24°8976 
26°0397 
18°8065 
10°4578 
4°7356 
1°8040 
“5898 
1675 
0416 
0091 
“0017 
0003 
“0000 


























Successes 


m= 254 1 


© Co WD Tit Co % 


m= 50 


9 2% 


CwMNAAY 





p=0 


na Bt 0 80°1587 


16°0317 


n=100| 0 66°8874 
1 22°2958 


7°3322 


p=1 


p=2 
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TABLE—(continued). 


p=3 


p=4 


64°1270 51°1981 40°7920 32-4330 
25°8576 31°2184 33°4361 
12°2826 16°5799 20°1031 


7°5681 
1°9024 
*4323 
“0908 
0178 
“0033 
"0005 
0001 


44°5916 
29°9273 
14°8625 
6°4708 
2°6038 
9912 
3614 
1271 
0433 
0143 
0046 
“0014 
0004 
“0001 


3°8912 
10701 
"2644 


29°6280 
30°0284 
20°0189 
10°9693 
5°3333 
2°3852 
10008 
3987 
"1520 


6°3556 
2°0562 
5855 


19°6185 
26°6919 
22°3956 
14°8274 
8°4691 
4°3589 
2°0720 
9237 
3901 
"1572 


33°5052 


9°0661 
3°3806 


12°9456 
22°1671 
22°4728 
17°4789 
11°4896 
6°6996 
3°5636 
1°7600 
8167 
*3590 
*1504 
0603 
0232 
0086 
0031 
0011 
0004 
0001 


p=5 
25°7320 
32°1649 
22°7047 
11-8013 
49929 
1°8077 
5764 


p=6 
20°3711 
29°9575 
24°3723 
14°3734 
6°8150 
2°7378 
“9606 
“3000 
0844 
0215 
0050 
0011 
0002 
“0000 


5°5769 
13°5550 
18°5789 
18°8406 
15°7005 
11°3492 
7°3484 
4°3512 
2°3900 
1°2301 
5978 
2758 
"1213 
0510 
0206 
“0080 
0030 
0011 
*0004 
0001 


p=7 


16°0915 
27°2737 
25°1757 
16°6391 
8°7536 
3°8700 
1°484] 


3°6405 
10°1832 
15°8126 
17°9434 
16°5656 
13°1572 
9°2958 
5°9710 
3°5398 
1°9578 
1°0184 
“5012 
2345 
"1046 


p=8 


12°6823 
24°3890 
25°2300 
18°5020 
10°7117 
5°1757 
2°1565 
*7910 
"2589 


2°3676 
7°5029 
13°0370 
16°3894 
16°6252 
14°4085 
11°0430 
7°6559 
4°8771 
2°8874 
1°6022 
"8386 
“4161 
*1965 
“0886 
0382 
0158 
0063 
0024 
“0009 
0003 
“0001 


p=9 


9°9724 
21°4922 
24°6693 
19°9086 
12°5970 
6°6134 
2°9790 
11761 
*4127 
"1299 
0369 
"0095 
0022 
“0005 
0001 
“0000 


1°5339 
5°4395 
10°4710 
14°4635 
16°0095 
15°0512 
12°4505 
9°2753 
6°3248 
3°9946 
2°3574 
1°3088 
6871 
3425 
*1627 
0738 
0320 
0133 
“0053 
“0020 
0008 
“0003 
“0001 


p=10 


7°8232 
18°7076 
23°6306 
20°8423 
14°3291 
8°1327 
3°9431 
1°6693 
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ON THE PROBABLE ERROR OF THE CORRELATION 
COEFFICIENT TO A SECOND APPROXIMATION*. 


By H. E. SOPER, M.A. 


(1) Ir is very important in determining whether the coefficient of correlation 
as found by any particular method differs significantly from the calculated value to 
know not only its standard deviation but also to have some idea of the nature 
of the frequency distribution. When the numbers dealt with are large, then, 
provided r be not nearly +1, we may quite legitimately assume a normal 
distribution and calculate the frequency of r on this basis, But it n be small, 
or if r have a value near either end of the range, then the usual values for the 
8.D. of r are not applicable and what is more in the latter case the frequency of r 
is of a markedly skew character and differs widely from a Gaussian curve. In such 
case the value of r found from a single sample will most probably be neither the 
true 7 of the material nor the mean value of r as deduced from a large number 
of samples of the same size, but the modal value of r in the given frequency 
distribution of r for samples of this size. In this paper the following notation will 
be used : 


p=correlation coefficient of the material from which the sample is drawn ; 
7 = mean value of correlation coefficient for V samples of size n; 


= modal value of the correlation in the distribution of the values of 7 as found 
from WV samples of size n ; 


r =correlation coefficient of any arbitrary sample of size n. 


The first question we have to answer is what is likely to be the distribution 
of the r’s. Clearly, when p differs from unity, it must be a skew distribution of 
limited range lying between +1 and —1. The general skew curves discussed 
in Phil. Trans., Vol. 186 a, pp. 343—414, have proved themselves so capable of 
describing all sorts of types of frequency that one naturally turns to them in the 
Jirst place in the present problem. There appears very little chance of successfully 
determining-—at least for a product-moment table—the distribution of r. We 
must start with the assumption of a reasonable frequency distribution and justify 


* The frequency-distribution of the correlation coefficient in small samples was first discussed by 
“‘ Student” in his paper in Biometrika, Vol. v1. pp. 302-10; he invited further mathematical investiga- 
tion and to a large extent supplied the impulse and direction to the present paper. 
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it a posteriori by means of experimental samples for given p and givenn. Now 
the only type among the skew curves mentioned applicable in the present case 


is of the form : 
a\™ a\ma ti 
wgitstl hs =) POSER Pe ae he 
y= ( .) ( * Ag () 


where, if the origin be at the mode, we must have 


gly Dh Og. «0s vavinns ines eceenuesaveucnvesséney (ii) 
Now if we suppose p to be positive, we clearly have 


a=1-7, a,=1+%. 














= 

iS ® 

8 3 

s 

<—y— 
“1 > Fi 
Cc D 
Hence from (i) and (ii) 
1 x ™m, 1 x Ma J 

Y=Y% ( - i) ( + i) ec cecccccccccccccnccs (iii), 
F wx (Mg — IM) /(Mg HM) — carecccvccvccrorcceceseeces (iv) 


Now let o, denote the standard-deviation of the distribution. Then we easily 
find (Phil. Trans., loc. cit. p. 368) 


C= 2(m,+1) _ 














= m, + My + 2 =] Url. Weeecsvconsebentceacevaasn’ (v), 
= ee 4 (m,+1)(m,.+1) : 
fs im tat a Fee Ce (vi). 
Thus F = (Mm, — ,)/(4y + Mg +2) ccccrcccscssccccccccccceses (vii), 
1-P=4(m, + 1)(m,+ 1)/(m, + m, + 2), 
and Ge? = (1 — 7) (am, + tg +B) oo ccrersccccecceccsccsccsceces (viii). 
It follows that 
1 = ~2 
m+m,+3= = = my say, 
_(l-F ue 
M,—m,=F = — 1 =F(A-1). 

Accordingly m =$(A—1)(1—F)-1 ) a‘ 

d m,=}(X—1)(1+7)—-1 J eeevcevccccccccccccocces (ix). 


Substituting in (iv) we have 


F =F (X—-1)/(A-8) 
and d=*—T=27/(A —3) 
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Since o,2= =(1—7)/d, and must grow way small as the number in the sample 
grows large, i.e. X. grows large, we see that * and 7 rapidly become equal as the 
sample increases or the distribution becomes symmetrical. 


The value of y, can be found from (Phil. Trans., loc. cit. p. 369) 


m,™ mam (m +m+1)T(m, + Mm, + 1) (xi) 
(mm, + m,)™tme C(m+1)T(m+1y OO ; 


The problem of the distribution of r would thus be completely solved, if we 
knew : 








Y= 


(a) r in terms of p, 
(b) o, in terms of p. 
Using Stirling’s Theorem we can reduce the expression of y, to 


1 . 3 
N (m+m.+1)V(m, + m) aiee 


2V2Q9 Vm, m, 


“el oDM Sa ached 
79 Vane, (1+ =) + in, -) * a : = m,+m,+1 
1 1 1 1 
x ¢l2 am a 
N ee 5 29 “S 
= Giee. {1 + 12 (— + My — m, + =) wv ce venceed be cnegee see eeeeneuaseeeeumet (xii). 


This approaches rapidly to the Gaussian value V, |(V2mo,), if o, be at all small 
and therefore \ and m, and m, large. For most purposes it is sufficient to take 





Yo= 





N g \e-1)(1-7)-1 g \3Q-1)(1+7)-1 a 
ae epee 1 “urea eae oo haat oi Soe FE s 
y Fiza. rc) ( + T+¥) (xii) 


where the relation between * and 7 is given by (x) and it only remains to 
determine 7 and oa, in terms of p. 


(2) Nowthe product moment value of the coefficient of correlation, p, between 
two measured characters in any population is defined by 


Pu — Pi Pa . 
<Silig Haden taal Xiv 
is (po — Pr) x V(pa— “Dor? *) ( ) 
Pw, Po being the first and second moments in respect of the first character, 
Pu» Po those in respect of the second character, and p, being the first product 
moment, all derived from measures of individuals taken from some arbitrary 
origins of measurement in the two characters. 








If samples of number » are selected at random the moments will have 
different values py’, Po etc, and in consequence the coefficient of correlation a 
different value, 7, in any sample, and 


pu = Pw. Pu t 
axiom ree (xv). 





~ pe — 
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The mean values of py’, Po etc. in all samples are py, Po etc., since the 
moments are crude and simple averages of individual values. Let dpy, dpx, 
dpn, Apo, apy be the deviations of py’, Po, Pus Po, Pu from their means po, Po, 
Pu» Po, Pur» The mean value of r we have called 7 Let dr be the deviation of r 
from its mean 7, then (xv) becomes 


“as + dp <= (Pro “i dpy) (Pa + dpm) . 
d y= - = eh nN) is tee = — wall 4 
aren V/{ pao + dp ai (Pro = dp)" x V{ Poo — Apoo 357 (Pa - dpo)’} oe) 
Choose the fixed origin of measurement of each character to be the mean 


of that character in the whole population, then p,) =p» =0, and (xvi) becomes 


= ood Put dpx — dpywd po +s 
7+ dr= Vint a-ak ea an (xvii). 
If the distributions and correlations of the deviations of the moments in 

samples of n are known this is the equation for determining the distribution of 
the values of the correlation coefficient. The average value of the right-hand 
side of (xvii) will be 7 The average values of the square, cube etc. of the 
right-hand expression will give the crude second, third etc. moments of r from 
which the moments of deviations from mean value of the correlation coefficient 
can be derived. 








Now if (xvii) be expanded in powers and products of the deviations it may 
be anticipated that the average values of terms of higher order in the deviations 
are of higher order in 1/n, and that there is a limit to the number of terms needed 
to give a required approximation. The approximation sought in the crude m»ments 
of r is to terms in 1/n? only, in order that the moments from the mean may be to 
terms in 1/n*, and so that o, for instance, which is known* to have the value 
(1 — p*)*/n to the first approximation for normal frequency, may be further carried 
to a term in 1/n’, 

Thus the process for determining 7 is to expand (xvii) and to find and insert 
in it the average values in samples of n of the various powers and products of the 
deviations of the moments involved, carrying the process on as far as is necessary 
to gather in all significant terms as defined above: and a similar process applied 
to the squares and cubes ete. of (xvii) determines the higher moments of r. 


Were the samples sufficiently large these deviations would approximate, as has 
been shown, to normal distributions, and the known properties of such distributions 
could be utilised in evaluating the complicated mean, but we are dealing with 
small samples where the deviations are not so distributed, and it is necessary, 
in the first place, to evaluate these moments of deviations in terms of the higher 
moments of the whole distribution without making any assumptions or any 
approximations within the limits assigned. After this is done the distribution 
of the two characters will be assumed normal and the results expressed in terms 
of p, the coefficient of correlation of the material examined, and n, the number 
in the sample, only, 

* Biometrika, Vol. 1x. p. 5 (if 8,=B,'=3). 
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The method adopted in this paper is that of grade groups. It is well known 
that if in an indefinitely large population the fraction f fall into a certain grade 
of a character or combination of grades of two characters, then in taking random 
samples of n the numbers of this group to be found in such samples follow tie 
binomial distribution of frequency 

(n—1 
franf(1— fy" GS) pra ppt. HA 
and that the mean number is nf and that the deviations from this mean number 
have moments 


mean (df= ~ f(1—/), 
» GF=5fA-Na-Y) 
» @=2PA-fP+ 4 fA -N-6F +, 
» (P= PA- SPA =A) +5 Ff) = 4) (1 — 14F + 19, 


etc., the fourth moment being the last which gives terms in 1/n? [see Pearson, 
Phil. Trans. Vol. 186, p. 347 and Phil. Mag. 1899, pp. 240, 241]. Here df is the 
deviation from mean value f of the frequency of the group in a sample of n. 

Moreover if f,, fr, fs -.. are the totality of frequencies of the various detached 
groups into which the population is divided by the graduation (which in our case 
is a double one) of character the various product moments of the deviations in 
samples of n may be deduced. These and the above, as far as our approximation 
needs, are put in one table as follows: 


SETA (y= = AA-fA) de: ute 
: df,.df.=—" fifa y seeaee( ix), 
. (f= SAA-A)A=B) ow veeeeen fank, 
» AO. df=- Shh -%A) a 
»  dfdfi= Ahhh . ae 
: y= 2 pra-fy . ein 
» — @f.df=—3 fefs-f) » seeeee(xcxiv), 
» GAP Y= APAA-A-fet BA) ye sxv), 
» dy Me. f= —F fff - Bf) y seeeee(xxvi), 


Mi dfe df af= TR fff » saxvii) 











96 On the Probable Error of the Correlation Coefficient 


the last five values being approximate and wanting terms in 1/n* to render them 
exact. 


The method of derivation of the product from the power moments is illustrated 
in the following example. 


Mean (d/,)?. df,.df,= mean {(d/,)? x mean df,df, for constant df}. 


Now in samples of constant df, the number of 1’s is nf, +ndf, and of not 1’s 
n—nf,—ndf,, amongst which latter restricted population in the whole community 


the frequency of 2’s will be iw and of 3’s Si if: Hence the mean number 
ure 


of 2’s in such samples will be (n— nf, — ndf,) reat 
1 
Kearns, 
(n—nf,—n i) TF 
differing from the mean numbers in all samples, nf., nf, by —ndf, 2 Zz and 
ae 


—ndf, rH and the mean product of the deviations from such means in the 
ara 2 | 


and of 3’s 





restricted samples wiil be 
—(n-nf,—ndf,). ge, er 


by (xix). It follows that the mean ee, of the deviations ndf,, ndf;, which 
are measured from the means of all samples, will be 


(na mf mdf). gf gla + (-ndfig 8s) (-ndhiy 25), 


im pals hofs 3 rg 
Le. = +ndfiey — fi +n? (dfiy 


in samples of constant df,. Dividing by n? we get te value of mean df,. df, for 
constant df, and so obtain finally 





mean (df, df,df, = mean — GY Sf GP Sofs 


df). Lh: 
i." a-e 


(l—fy’ 





which by (xviii), (xx) and (xxiii) 


1 1-2 
= hffit JASN hi 2h) +o 3 fehhs 
set shhh — 3fi), 
to our approximation. 
The other formulae were arrived at in like manner but the process is lengthy 
and these formulae and the general formulae that follow have been verified by a 


shorter process, which however being less direct in method is not introduced in 
this paper. 


There is no necessity to take the products of the deviations more than four 
together, for these do not give terms in 1/n*%. Did any products, five together for 
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instance, give terms in 1/n?, then mean (df, + df, + df, + df, + df, would have such 
terms, which is contrary to the formula arrived at above for mean (df). 

Having obtained the various fnean products of deviations of group frequencies 
shown in equations (xviii)—(xxvii) the mean products of deviations of moments, 
formed by associating with such group frequencies their grade values, follow. 

Let a, a, ... be the values to be assigned to the grades 1, 2... in the 
formation of the moment p (these values wili in the present case be the product 
of one power of one grade of one character with another or the same power of 
one grade of the second character). Then 

= af, t As fo a a; fy +... 
and if a,’, a,’ ... are the values proper to a second moment, p’, in like manner 
p =a fi ta fata; fyt..., 
and if in random samples of n deviations df,, df, ... in the frequencies lead to 
deviations dp, dp’, in the moments, all deviations being taken from the above 
universal values which are also the mean values in samples, then 
dp =a, df, + a,df, + asdf; + ... 
dp’ =a; df, + a/df,+a; df, +... 
and so 
mean dp. dp’ = mean [a,a,' (df, + a,ay (df, +...4+ a,ae df,df, + a, a,df,df, + ...] 


1 bas “A eee : nae pis teckia 
ae’ [a,a, fil —f+ aa, fil —ft) +...-aa fife— afi fo—---] 


1 : he ; . yi , 
=- [aay fy + dee fy t+... —(GQfi + fot...) (fit a fot ...)]. 


If then p is the u, v moment defined by 

Puv = 0;"b,° fy + ay," fin + Og" b,” fog + «+. 
obtained by summing the products of the group frequencies f by the uth power of 
the grade value a of the first character and the vth power of the grade value b of 
the second character in that group; and p’ is the w’, v’ moment defined by 

. Pu = a," by” fir + ay” be” fig + ag" b." fin + +. 
it follows that the first term in the above square brackets is 
a,"t* 69+" f,, + a,°** 6," 7, +... 

OF Puswv+v, and the general formula for the mean products two together of | 
deviations of grade moments is* 


1 ne 
mean dpuy.dpwy = 2 er en errs (xxviii). 


* See W. F. Sheppard, “On the application of the theory of error to cases of normal distribution 
and correlation,” Phil. Trans. 1899 (192 A), in which paper (p. 127) are given formulae for the mean 
products, two together, of errors of moments calculated from the means of samples. In the present 
paper, it should be noted, the moments of the samples are crude, being calculated, not from the means 
of the samples, but from the mean values of the measured characters in the whole population; and dp 
is the deviation in the value of the crude moment in any particular sample from its mean value in all 
samples, which is mean ay (f;+4f;)+4@2( fe+dfo) +...=a1fi+42fo+...=p or the moment in the whole 
population. This latter is a true moment, the general means having been taken as the origin of 
measurement. 
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It will be observed that nothing in the proof prevents w’, v’ from having the 
same values as u, v and the formula is true for any second order moment whether 
power or product. 





In like manner if p, p’, p” are any three moments of the material sampled we 
have the equations of deviations 
dp = a,df, + a,df,+ asdf; + ... 
dp’ = a, df, + a,'df, + a; df, +... 
dp” = a," df, + ay" Ufy+ ay’ dfy+ «.. 
giving . 
mean dp.dp’ .dp” = mean [a,q, a," (df,)* + ... all grades 
+ (a0) ag” + ay Oy dy + G0," dy) (Af, P Af, 
+ (Gy Ae Ay” + Ay’ dg” + ay” Age’) Af, . (df, + ... all pairs 
+ (Aye dy” + 0,00" dz + Oy’ Ayas” 
+ ay de ds + Oy dys + A, dy’ a3) Af, .df,.dfz+...all triads], 


in ah 


and inserting the values from equations (xx), (xxi), (xxii), 
mean dp.dp’.dp” = * [a,a,' a," f, (1 — ft) (1 — 2f,) +... all grades 


— (aay ag” + ay ay” dy + 0," a2’) f, fy (1 — 2A) 
— (A, Ae dy” + Oy, dy” + Ay” dede’) fr fo (1 — 2f,) — ... all pairs 
+ (hy Cty’ Oy” + Gy Ae” dy’ + Oy! dts” 
+ ay! Abe! dg + Oy” Ags’ + ay” ay As) 2f fof +... all triads], 
and collecting terms of first, second and third degree in /f’s and suitably 
commuting the /’s and a’s this is seen to be 


1 , ve 
= pr At a fit...) 


— (aa fit...) (fit...) —(aa’ At...) (a fit...) — (aay fit... (aft ---) 
+2(M fit...) (At...) Q”"fit...)], 
the sums being for all grades. 
If then p, p’ have the double grade values py», pwy previously assigned and 
p” stands in the same way for pyy where 
Prey = yb,” fir + ay” Dg” fin + "De" fn + « 
there results the general formula for the mean products three together of deviations 


in moments as follows 


1 
mean dPuv * dpwy ’ dpu'y" = n? [Pu teu! v+o'+0" — Put v+v Pur’ 


= Putu' v+v" Pu Wii Pw tu" v'+v" Puv i 2PuvPu'v Pu’ coeeve (xxix), 
where, as before, the values of the suffixes may be any the same and the formula 
gives power moments equally well with the product moments of the deviations. 
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Precisely the same process evaluates the mean products four together. We 
shall have, putting in representative terms only of each series, and using equations 
(xxiii), (xxiv), (xxv), (xxvi), (xxvii), 


mean dp.dp’.dp” .dp’” = 2 [aay ay” ay” 3f? (1 —fAP +... 


= ach "ag" 3f2(1 — fi) fst». 

+ a,0y' 04" 04" ffl —fi —fot3hi fr) + ++ 
— aya, ay" a3” fr fo fs(1 — 3f,) — ... 

+ a,a9 a3" 04" fi fofa fit ---] 


1 : yw mye 
+ [(aay fit...) (qa fit...) 


+ (aay fit...)(a/a"f, +...) 

+ (ajay fit...) (a a," fit...) 
—(aqa fit...) (a fit...) GQ” fit... 
—(qq,"fit...) (a fit...) (a fit... 
— (aya fit...) (a/ fit...) (a fit... 
—(aa," fit...) (afit...) GQ” fit... 
— (ala fit...) (Hfit...)(Q" fit... 
—(a" a" fit) (HAF ---) (afi +...) 
+3(MfAit...)(a’ Ait...) (Q”"fAit...) (Q”At+...)] 


on collecting terms and rearranging the associations of /’s and a’s as before. 
And putting into factors this 


= “ [(aay' fi +...)—(afi t+...) (a fit...) (aq a’ fi +...) — (fit...) (afi +...)} 
+ {(aqay"fit...)—(afit...) (a fit...) (qa fi +...) - (a’fit ...) (af, +...)} 
+ {(aqa"fit...)—(afit... an" fit...) (a ay" fit...) —(a’/fit.. ay" fit...)}} 
And so again if the material is double graded and py, Pwv, Pw'v', Pu” are any 
four moments involving products of powers of both grades, the general formula 


for the mean products four together of the deviations of such moments in samples 
of 2 becomes 


mean dpuy. dpwe - dpwry - dpwry 
1 ‘ 
= Pr) [(Putw’ vty — Puv Pu’) (Pu’+w” v’+0" — Pu” vy’ Pu”) 
+ (Pus o+v’ — PuoPw'e’) (Pw+w v+0" — Pu’ Pw) 


+ (Pup v+0"” — Puv Pur”) (Pw pu’ v'-0" — Pu Pu'v’)] teeeeeeee (xxx), 
13—2 
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hie ; ‘ Se 
and it is to be recalled that this formula omits terms in ne not wanted to the 
degree of approximation laid down. 


Comparing (xxx) and (xxviii) it appears that the mean values four together 
are equal, within our degree of approximation, to the sum of the products of the 
mean values two together of the complementary pairs making the four, the 
division being possible in three ways. 


mean dpuydpwy dpu'y' pws = mean dpyydpwy * mean dpyy dpyy” 
+ mean dp,,,dp,,” mean dpyydpyy” 


+ mean dp,,,dpyry < mean dpyyApyry «-(XXXi). 


Uv 


It is unnecessary to find the general formula for the mean products of 
deviations five together, which by p. 96 will contribute nothing within our 
approximation, and formulae (xxviii), (xxix) and (xxx) applied to the expansions 
of (xvii) and its powers are sufficient to evaluate the general formulae for the 
mean aid moments of deviations of r as far as terms in 1/n’. 


It is not proposed to exhibit these general formulae for moments of devia- 
tions of 7 in terms of the higher moments of the given dis‘ribution at length, 
but to proceed at once to the simpler case of a normal distribution in the two 
correlated characters and reduce the higher moments to second moments and the 
coefficient of correlation, p, as such distributions, it is well known, admit. In 
order to reduce in this way the values of the various mean products at the same 
time that they are evaluated by the formulae (xxviii), (xxix), (xxx), the necessary 
formulae of reduction are next obtained. 


The expression for dr involves dpyw, dj», dp, dp. and dp,, and (xxix) shows 
that we shall require to reduce py, Ps... Pso---Pio-+-Pyo--. in the above way. 


Now it is weil known that in normal distributions, following the Gaussian 
law of frequency, the odd moments from the mean in either character vanish 
and the even moments are derived from the second moment by a simple 
formula of reduction, from which there results that 


Po=Px=Po=9, Pos =Ps=Pn=9, 
Pw =3Px, Pw =5.3po*, Por= Bp, Poe = 5. Spor". 


And, utilising these results, the higher product moments of normal distributions 
in two characters may be derived from the first product moment and the second 
moments by two well-known properties of the Gaussian surface. 


If w, y are deviations from their mean value of two normally correlated 


characters the mean value of y for a given « is Pu > and if y’ is the deviation 
P20 








thet 14 
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of y from its mean value in the array the distribution of y’ is normal; its second 
moment is Pa Pe; and its higher moments follow the same laws of reduction 
ie s 
as above. Hence 
Puy = Mean a" y? 


= mean {a x mean y’” for given 2} 


) v ; 
= mean j2" X mean (Ps “+ y) for given « 
‘P20 
f / v 9,272 
= mean \@" x mean (a * a+v t . am aly’ 
v v—1 
P20 Px 


v.(v—1) p,** . : : 

Je ) Par gy”? + a for given al ‘ 
1 ° 2 Px» 7 j 

and so remembering that mean y’, mean y’*, etc. vanish, and mean y’4, mean y’*, etc. 

reduce by the above formulae we have 


Ba \" v(v—1) /py\’ 
Puv = (2) Put v0 > S 4 2 al (Ps) Putv-20 


Pon Po 
v(v—1)(v — 2) (v — 8) (py\"* Pr? 
+ 1.2. 3- 4 aie e) Put +10 (Pa “) 
v!(Pu\’* 3 (p —Pm) oe 
+ 6 (o—6)! @:) Purv-60-2 (pe =) ee (xxxll). 


It follows that if w+ v is odd p,, is zero, that is 
Pwo = Pu = Pre = Pos = 9, 
Poo = Par = Pre = Pos = Prs = Pos = 9. 
If u+v is even it is convenient to divide by suitable powers of px and py», and 


es, - exhibit all the reduction formulae together as follows: 


putting p for = =a 
Poo V Poo 
Px Pog? = Pos/ Pox” = 3, Pan|Pa® Pos? = Prs/Pao* Pest = 3p, Px2/ Poo Poo =1+ 2p’, 
Peo! Pao® = Poe] Pos? = 15, Duin? Dost = Prs/Poo® Dox? = lip, 

Pso/ P20" Por = Prs/ Poo Pos = 3 + 12p°, Pas/ Pon? Post ak. te. ee (xxxili). , 


If in like manner the numerator and denominator of the expression (xvii) 
for the deviation in r in terms of the deviations in the moments be divided by 
Vpo*VP» and we write 








a= Pr i dP» > By _ dpa ’ := Pe “tie nce Pn 
/ Po P20 NV Poo Poo V Po / Por 
it becomes 7F+dr= BN dah: -. SOS a RES (xxxiv) 


V(1 + & — 0,2) x (1 + B, — B?) 
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When this is expanded, the mean values of a, 8,, %, B:, and y are of course 
zero. In the following tables the general formulae (xxviii), (xxix) and (xxxi) for 
the mean products two, three and four together of deviations of moments in 
samples of n are given at the head, the suffixes used and their composition in the 
several formulae are shown in the initial columns (omitting repetitions of dp and p 
to abbreviate the printed matter), and the resulting formulae for the mean products 
of the a’s, 6’s and y’s required in the expansion of (xxxiv) are given in the last 
column, the reductions of the higher moments having been made by (xxxiii) to 
suit the case we are investigating, that of a normal distribution of the two 
characters in the material sampled. Since the first and second moments only of r 
are at present sought it is unnecessary to take products involving higher powers of 
y than the second. The four additional formulae to + are inserted, however, to 
complete the formulae for the third and fourth moments when required. 


As an illustration, if in equation (xxix) we put 
u=0, v=2, w=l1, v=1, w’=1, v’=1, 
1 
we get mean dpodprdpur = ne [ Pos — PisPu — PisPru — Px Po + 22 Pu Pu- 
Dividing by p.p” and using (xxxiii) it follows that 


1 
mean f.7= = [(3 + 12p*) — 3p? — 3p* — (1 + 2p*) + 2p?] 


1 
- nr [2 + 6p"), 


the penultimate formula in table (xxxvi). It will be seen that the values to be 
attributed to the terms signified by the suffixes between the double rules are the 
right-hand sides of (xxxiii), the composition of the terms being shown in the last 
column of the formula. For 24 put 3+ 12 °%, for 13 put 3p, for 11 put p, for 22 put 
1 + 2p’, for 02 put 1, and the formula for mean ,y* may be written down without 
any division being necessary. 


The formulae (xxxvii) are derived from those in (xxxv) and the suffixes between 
the double rules are for reference to the first columns of the latter table. Thus in 
the penultimate formula look up 02 11 in (xxxv) and find 2p. Look up 11 11 and 
find 1+p% 2ox(1+ *)=2p + 2p%, the first of the three component terms added 
together in the last column of the formula. 
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= mean dp,,,.dpyy x mean dp, .dpyry 
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+mean dp,,.dp,"" x mean dp, .dp, 


Matt | 
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| uv uv uv’ | uw | uv uv uv 
|——— ——_|—— a ier Sas sate - 
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02 | 10 02 10} 02 10 | 02 || a,?B2 2 
| O1 | 20] O1 | 20) O1 | 20} 01 apy? 2 
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Pe. a | eee as ae ay | 6p+6p? 
| 12 | 11] 12 | O28] 11 | U1] 211 Boy 6p + 6p? 
lj} Wj} no jy 1 y! | 2+4p?+ 2p! 
So ret LEONE ice: Kes, =i ele! 
nibbeviiesgieco shes) (xxxvii). 
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We are now in a position to expand (xxxiv) to terms of the fourth degree in 
a, 8 and vy, take mean values of the a, 8, y products as found for samples of normal 
distributions and so obtain 7 the mean value of the correlation coefficient in 
samples of n of such distributions correct to terms in 1/n?. 
r+dr 

=(e +y—- % A) 

x (1— $a, + ba,? + $a," — fa," a, — fran? + Za,' + }ha,% a? + Aha!) 

x (1-48 +4824 982-28 °8,— foB? + 88+ HABE + BeBe) 
=p —4{p (a + B2) — 2y} 
+ 4 {4p (a? + B,*) + 8p (a? + 8.2) + 2pm, — 4 (aay + Bary) — 8a,8;} 
+ dy {—12p (aa, + 8:28.) — 5p (an? + Ba*) — 4p (0,78, + 08,") — 3p (aR, + &8,") 
+ 8 (a?y + Bey) + 6 (aPy + B27) + 4a Bary + 8 (0.%8: + % 81 B2)} 
: + hg [48 (a* + B,*) + 120p (a,?4,? + B,°8.?) + 35p (a4 + Bo') 
+ 48 (a,°a, 8. + 8,78.) + 20p (a.®B. + 4.8,*) + 82pa,2B,? + 24p (a,28.? + a2B,*) 
. + 18p2,°8,? 
— 96 (aay + B,°By) —40 (as?y + Boy) — 32 (a*Bary + AaPr?y) — 24 (as*Buy + B2"y) 
— 64 (a8, + a, B,°) — 48 (0,428, + 4,8, B.") — 32,8, a, Bo}  ....ceeeeeeeee (Xxxviil). 


Whence taking mean values in samples of n of a normal distribution, 





| § F= p+ x (8p + 12p + 4p! — 16p — 8p] 

| a + és {— 48 — 80p — 16p* — 48p? 
+ 32p + 96p + 16p + 16p* + 32p} 

ae 
128n? 

+ 192p? + 480p* + 32p + 64p* + 96p + 72p + 144p° 
— 8384p — 960p — 128p — 192p — 384p? 
— 8849 — 192p — 64°} 


288p + 480p + 840p 


3 2 i : 
=p—s5-p(l-p’*)- Bye P (1 — p) (1 + 8p)... crccscsvceces (xxxix), 


Or, expressing the result in terms of n—1 (by changing n into n’+1 and 
expanding) we may write to the same degree of approximation 


7 =p E -sap|! - ia} | eee (xl), 


from which follows that 


1-FP=(1 =p) 1 +A - sat | Pere (xli). 
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And, again, by squaring (xxxiv) and expanding we obtain in the same way 


7 + 27dr + (dry 
= (p? + 2py + 9° — 2pa,B, — 20, Biy + 7:°B;’) 
x (1 — a, + a? + a,? — 20,20, — a? + a4 + 3a,?a,? + a4) 
x (1—f,+ 8; +B — 288, — BF + Bit + 38,8, + B:*) 
=p’ + {2py — p°a.— p*B;} 
+ {ry — 2p (4B + Gary + Bory) + p* (a? + B,? + 2,7 + B.? + aBe)} 
+ {— 2a, Bry — ary? — Boy? + 2p (0,428; + 08; Bot a? + By + a7 + Boy + a, Bo'y) 
+ p? (— 2a,%a, — 28,8, — a. — B.? — 4°B, — a, B,? — a,°8, — a,8,")} 
+ {a2 By? + 20a, Biy + 20, By Bary + Hy? + Bey? + oP? + BoP? + a2 Bary? 
+ 2p (— 4,°8, — 4B, — 4,428, — a, 8,8? — a, 8,48, 
— 20,7 dary — 28? Bary — a? — By — a Bay — By — a Bary — 82") 
+ p? (a! + Bit + 3a,2a,? + 3828.7 + a1 + Bf + 2a,20,8, + 20,828, 
+438, + 0,8) + a2B,? + 028, + 0282+ O28,2)) ..cccccccccscesccseeees (xlii). 
And taking mean values in samples of n of a normal distribution, 
7 + mean (dr)? 
=p't++(1 + p*— 2p (p + 4p) + p?(2+ 4+ 2p”)} 


1 
+ — {—2—2p?— 4— 12p? + 2p (49 + 4p + 16p + 4p + 4°) 


n 
+ p?(— 8 — 16 — 4p? — 16p?)} 
1 2 2 2 2 
+ = {1+ 2p + 8p? + 2 + 2p? + 4 + 20p? + 10p? + 2p% 
+ 2p (— 6p — 4p — 2p* — 8p — 24p — 4p — 8p — 16p') 
+ p? (6 + 12 + 24 + 8p? + 240? + 1 + 2p? + 4+ 8p* + 4)} 


1 
=p? += {1 — 8p? + 2p'} 
1 : ’ 
+= {— 6 + 18p* — 12} 
1 
+ 317 — 15p? + 8°} 


1 , 2 9 1 2 
=p +7 (1 — p*) (1 — 2p*) + 55 (1 — p*) (1 + 4p? — 8p%). 


And by squaring (xxxix), 


1 1 
= p*—— p*(1—p*)— 5 p* (1 — p*) (1 + 5p"). 
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Hence by subtraction, 
1 1 
mean (dr) = = (1—p*??+ = (1 — p*)?(1 + 5h"). 
Or taking the square root 





fe NS 


Co, = 


Vn 2Qn 
which may be expressed in like manner with 7 in the form 
1-p | 11p? | Soa 
Op = a | 1 + a Joe eee eecnccceetccencceee xliii), 
Wa-il *t@-5 — 


to the same degree of approximation. 


It appears then from the above results that if the coefficient of correlation 
existing between two measured characters in a large aggregate of individuals be 
computed from the product moment values in small samples, these values are 
subject to errors from a mean value, the standard deviation of which errors may 
be very approximately represented by the formula 


and with greater degree of accuracy by the formula 


1p? 11p? 
seiner 7 +- ), 
Vn — ( 4n 


p being the coefficient of correlation between the characters in the material 
sampled and n being the number in the sample. 


Moreover the mean value of the correlation coefficients obtained from such 
small samples will be less than the true coefficient of the aggregate and will be 
approximately represented by the formula 


1-f” 
pone = ): 


the defect being very small when p is large, and when p is small being of the order 
5°/, in samples of 10 and ‘5 °/, in samples of 100. 

On the other hand the modal value of the correlation coefficients, or the most 
likely value in a single sample, will be greater than the true correlation coefficient 
(that is to say numerically greater: the correlation being supposed to be measured 
positively). 

We have, by definition 1/rA= i —s 
and so from equations (xli), (xliii) putting n —1=n’ 
$23 11p2\ / 2\—1 
ipA=-——- (1 +G5)(1 +6) 


n 





using second approximations. 


14—2 











108 On the Probable Error of the Correlation Coefficient 


And hence from (x) going now to third approximations, 


ss 1—p? , (1 — p*) (1 — 9p’) 
=p {1- Qn’ 7 a 


ofr Taf 00209 
n Qn” 
n 2n? 


=p {1 PAU — p*) , (41+28p") (1 =P) 


2n’ 8n® 





The excess of ¥ over true p is zero if p=0 and if p=1, but if n is small and p 
fairly large the excess may be such as to make the modal value unity or greater 
than unity. If for instance n is so small as 4, n’ being thus 3, the above 
approximate equation gives 


F=p+tp(1—p*) + (Hp + He) (1 — p’) 


or F = 2069p — ‘750p* — 319p° 
= ‘93 when p=‘5 
=105 , p=% 
=rl4 , p=. 


The frequency distribution in the last two cases is of the J type, there being 
no mode within the range. The greatest frequency is at the extremity of the range, 
or at value unity. The interpretation of this result is clearly that such small 
samples as 3, 4 or 5,as might be expected, fail altogether to give by the product 
moment formula an approximation to the correlation coefficient. Under some 
circumstances the points which graphically represent the observed measures are 
more likely to be in a line than to have a configuration represented by any 
specified fractional correlation coefficient. This will happen if the correlation in 
the material has a larger coefficient than ‘6 (approx.) when samples of four are 
drawn: ora larger coefficient than ‘3 (approx.) when samples of three are drawn. If 
samples of two are drawn the coefficient of correlation is necessarily unity in the 
sample whatever it may be in the material*. All the distribution is concentrated 
at value unity and # should in this case be infinite for all values of p. Our 
approximation, neglecting terms in 1/n* etc., cannot of course show this if n’=1. 
It gives * greater than unity and so a J type, but fails to show the complete 
concentration at unity. 


It appears from (x) that # will be infinite when X=3 and 7 any value other 
than zero, whilst * will be zero if 7 is zero and X other than 3. If 7, and therefore 


* Supposing the material ungrouped. If it is grouped some values will be indeterminate in small 
samples, viz. when all observations fall into the same group. 








AR Sevan 
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p, is zero and X=3, # is indeterminate. This case is a little singular and (ix) 
shows that m, and m, are zero and that the frequency is therefore constant or 
that all values of r are equally likely, from —1 to +1. Remembering that 


A= 





<3 it will happen when o,?=4. But when p=0 the value we have found 


for o,2 is —, which will be 4 whenn=4. If therefore from material possessing 


zero correlation samples of four are drawn, the values of the correlation coefficients 
in these samples should, if the above formulae and assumed type of frequency 
distribution are correct, be equally distributed in value throughout the whole 
range. 


Complete .experimental confirmation of the distributions found in this paper 
for the product moment expression of the coefficient of correlation in small samples 
is difficult to obtain. The second order differences in the values of the mean and 
standard deviation of the distribution are necessarily small in large samples and 
comparable with the errors of sampling of such samples unless a very large number 
of samples is taken. On the other hand if small samples are taken the theory is 
not so exact and in addition the distribution tends to concentrate in certain 
grades if the original material is coarsely grouped, whilst fine grouping adds very 
much to the labour of computation. 


In a paper published in Biometrika, Vol. v1., p. 302, “Student” gives the 
results of a very painstaking investigation to determine experimentally these 
distributions. Material based on W. R. Macdonell’s measures of finger length and 
stature in 3000 criminals [Biometrika, Vol. 1, p. 219] was formed having correla- 
tion with coefficient* 66 and samples of 4, 8 and 30 were drawn. Material with 
correlation zero was also constructed and samples of 4 and 8 drawn. 


The comparison of the actual means and standard deviations found in these 
experiments with those computed from the formulae in this paper is shown in the 
table that follows. 





1 | Tt -¥ | J 5 
ao | ee Se 
Number in sample, 7 ... = ane tee ee ean as I. ee ee 30 
Correlation coefficient in material, p sae ae ped a “66 66 “66 
| ae ‘ 
Mean correlation coefficient in samples, observed ee — }|— ‘5609 6139 “661 
‘i calculated, 7 ... 0 0 °5933 , 6317 | 6542 


” ” ” 
| Standard deviation of coefficient in samples, observed .., | °5512 | °3731 | 4680 | -2684 | “1001 
to i" ss calculated, ¢,  °5773 | *3780 | -4234 | 2453 | -1088 
| Number of samples taken he ove one we | 745 750 | 745 | 750 | 100 
Probable error of mean re ci bes me .-. | 0143 | 0093 | -0073 | -0105 | 0060 
standard deviation o> | OIG | 0066 "0052 | -0074 | -0043 | 


” ” eee eee . 





” 


* The exact figure was *6608. 
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It will be seen that the differences of observed and calculated values are for 
the most part several times the probable errors. Es‘.mated in this way case (3) 
p='66, n=4, is the worst fit, the difference of the mean being five times and that 
of the standard deviation nine times the probable errors. 


From the values of 7 and a, found above are calculated 























| AG ee ee 5 5 | 
See Sek a 
| | 
iets et 8 | 7 | 36140 | 99905 | 48-325 
oy 
a] 0 
tao? 5 | 0 | 25259 8124 6831 | 


The frequency distributions of Type I to fit the numbers of samples taken in 
the experiments and the values of 7 and o, calculated will be of the form 


y=y(l—ay (1+ am 


when referred to the absolute origin and unit of measurement of r, and the 
constants* will be 








1 2 3 4 5 | 
m=} (A—1)(1-7)-1 = 0 2 —*46844 | -6557| 7-1825 
| mg=}(A—1)(1+7)-1 = 0 2 1:08243 | 6°3348 | 38-1425 


¥ T (m +1 2 
dl 1 Mz + 2) ane ise a. ‘ « 
yg et ae we | SIGS =| 70°18 | GOSS 95°131 0023886 
Yo Qmy + m+ 1 I (m, +1) a: (mg 1) ‘ ) ‘ | 92°25 95°13 0023889 | 
| 





When the above frequency curves are plotted they appear as shown on the 
diagrams pp. 112, 113 and are seen to be in fair consonance with the frequencies 
observed in the experiments and shown by the rectangles upon the same diagrams. 
They are perhaps as good an expression of these frequencies as could be found 
amongst the type of frequency curve assumed. The case of p=0, n= 4, for which 
theory prescribes a horizontal straight line is seen to be very nearly so in the 
experiment, apart from individual fluctuations. In p=0,n=8, the curve well fits 
the deviations from zero correlation observed. In p=°66, n=4, the asymptotic 
nature of the distribution towards the value unity which the fitted curve fore- 
shadows is borne out in the samples of four drawn. With larger samples from the 
same class of material the displacement of the mode and the skewness of the 
distribution resulting from the assumed types are corroborated in the tests. 


At the same time it must be admitted that there are considerable differences 


* For the special case of p=0, eqns. (ix), (xl) and (xliii) show that the curve is y=y, (1- aah (m-4), 


the form suggested for this case by ‘‘ Student,” Biometrika, Vol. v1. p. 306. 








nt dan ts st: 





On alr: 
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needing to be accounted for. The individual irregularities in the observations are 
more than would be expected in random samples of homogeneous material and it 
is probable that these jumps which make a good fit of any continuous curve what- 
ever an impossibility are partly due to grouping. If the grouping of the original 
material were too coarse there would be a tendency in small samples for statistical 
constants to centre round certain values. Another possible source of error is in 
the mixing and in the drawing out of samples. Although a great deal of trouble 
was undoubtedly taken in these experiments, yet there always seems room for a 
little involuntary order in repetitions intended to go solely by chance. 


The curves were planimetered and the following tables show the comparison of 
theoretical with observed frequencies of the grade values. It will be seen that 
the differences are not systematic but that the + and — errors are fairly mixed. 
On calculating the square contingency, x’, and deducing from this and the number 
of groups, n’, the probability, P, of these differences being purely that of sampling* 
such probability comes out in most cases very small. It seems legitimate in this 
instance to see to what extent grouping will smooth down the irregularities and 
yet show the general resemblance and, with this in view, the differences in the 
columns headed e’ are calculated and the grouping therein shown may be taken to 
indicate what is necessary to bring the probabilities within reasonable distance of 
expectation. 















































p=0, n=4, p=0, n=8 
| | 
> comer Sere Difference | é ar | Observed | Difference e 
oS requency e m ak frequency | e m 
wee | | 
| Beas 
‘925—1:0 | 27°94 22°5 | — 5:44 | 1:06 6 — |- € “60 
*825— | 37°25 315 | — 5°75 “89 4°] 3 11 “30 
725— 37°25 24°0 ~ 13°25 4°71 11°6 12 ie ‘01 
*625— 37°25 35:0 | — 2°25 14 20°6 115 | - 91 | 402 
525— - 34:0 | — 3°25 28 31°9 28°5 | — 3:4 36 
“425— ‘a 470 | + 9°75 | 2°55 42°3 46 + 3-7 "32 
325— 30% | — 6°75 1°22 516 475 | — 41 33 
225— s 46°5 | + 9°25 | 2°30 59°9 70 | +1071 | 1-70 
125— : 440 | + 6°75 | 1°22 65'8 575 | — 83 | 1-05 
_025— ne $2°0 | — 5°25 74 69°4 70 | + 6 ‘01 
1:925— 450 | + 7°75 | 1°61 70°3 605 | — 98 | 1:37 
1825— 2 430 | + 5°75 “89 67°9 715 | + 36 19 
1°725— - 41:0 | + 3°75 "38 63°1 76 «=©| +12°9 | 2-64 
1°625— a 370 | - @& ‘00 56°3 63 «| + 67 80 
1°525— na 440 | + 6°75 | 1:22 47°1 a |< ‘BD 
1°425— os 400 | + 2°75 -20 36°9 33 - 39 ‘41 
1:325— “ 38°5 | + 1°25 04 2671 29 | + 29 "32 
1-225 es 325 | — 4°75 ‘61 15°3 20 + 4°7 | 1°44 
| t125— ss 36°0 | — 1°25 04 7°4 8 |+ 6 | 0 
I—1'125 46°56 410 | — 5°56 83 1°8 1 }- 8 ‘36 
745 20°93 750 16°83 
n’=20, y?=2093, P=°341. n’=20, x7=1683, P=-601. 


* Tables for testing the goodness of fit, W. Palin Elderton, Biometrika, Vol. 1. 
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*905—1°0 
*805— 
*705— 
*605—- 
*505 — 
*405— 
*305— 
*205— 
*105— 


*925—1°0 


*825— 
"725— 
*625— 
‘525— 
*425— 
325— 
‘225— 
*125— 
‘025-— 

[-925— 

1:825— 

1°725— 

1°625— 
1°525— 








Observed Difference 


Calculated 
frequency m!| frequency 
230°3 175 
98°9 | 136°5 
72°1 | 84 
57°6 66 
480 | 55 
40°2 | 45 
34°3 24°5 
29°7 | 24°5 
25°6 19 
22°0 | 7 
18°8 22 
16°0 12 
13°5 13 
11:2 é 
9°0 12 
6°9 16 
51 7 
3°3 10 
1:9 4 
6 9 
745 
n=10, x?=8417, 
n’ =4, 
Calculated Observed 
| frequency m |_ frequency 
48°9 37°5 
127°6 126 
135°25 | 187 
120°0 | 125°5 
97°25 | 105 
73°9 80°5 
53°5 39 
37°0 37 
24°25 16 
14°75 | 14 
8°5 9 
4°75 =| 7 
2°25 | 9 
9 3 
25 | 2 
a 2 





| 


e2 


3°06 
9°21 


8°80 





Ps 


44°10 | 








Lo ltteti 
DORADO ES 
i 


SIKH AIS Ass aaeKaoe 


+++4+4+4 
ht ht BD DD bO 














84°17 


P very small. 
P=:237. 


p='66, n=8. 





x? = 85°92, P very small. 
= 285, P=722. 
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p='66, n=30. 
een mare Ch | 
‘ Calculated | Observed Difference | é 
frequeticy m | frequency e | m 
ae or = | —_ 
*795— 8°5 6 —2°5 “74 
*735— 15°8 17 +1°2 | “09 
675— 21°6 27 +5°4 | 1°35 
/ *615— | 20°9 22 +11 06 
: *555— | 15°8 14 —18 | 21 
495— | 9°8 10 +2 | 04 
1—-495 | 7°6 4 -3°6 | 1°71 
100 4°20 


n=7, x*?=420, P='650. 


It is hoped that further experiments may be shortly carried out which will 
have regard to the points raised and show definitely whether the distributions 
theoretically arrived at in this paper are good presentations of fact and whether 
the application of the standard types of frequency curves to the distributions of 
statistical constants in small samples is justified. 


I am indebted to Professor Pearson for drafting the lines of this investigation 
and for critical supervision. 











ON THE MEASUREMENT OF. THE INFLUENCE OF 
“BROAD CATEGORIES” ON CORRELATION. 


By KARL PEARSON, F.RS. 


(1) Bya“ broad category” I understand one of a finite small number of groups 
into which we class a variable. For example: we may divide General Health into the 
categories Very Robust, Robust, Normally Healthy, Rather Delicate, Delicate and 
Very Delicate. These categories may be verbally defined or have their boundaries 
determined by quantitative limits as when we state that the limits of the Delicate 
coincide with so many weeks of sickness or of absence from work in the year. 
Again we may put into four to six classes the competitors in an examination, and 
the boundaries to these classes may be really percentages of marks gained. Such 
broad categories are very common not only in social investigations, but also in 
psychological records, and quite recently Dr G. A. Jaederholm, a Swedish 
psychologist, wrote to me asking what was the correlation between the true 
quiatitative value of a variate in any individual and that individual’s category 
or class-mark. The answer is an obvious one, but I do not know that I have 
seen it stated or any discussion of it given. It of course assumes that at the back 
of the categorical classification a true quantitative value lies. 


Suppose a population of NV individuals divided into p classes, and that C, is 
the class-mark of an individual in the sth category, whose true variate is x. 
The problem is what is the correlation of # and the class-mark. Let %, be the 
mean variate of the group of n, individuals who fall into the sth class. Then it 
is just as reasonable to call Z, the class-mark as C,, for given one the other is fixed. 
We really want then the correlation of # and #,. 

We may either find this directly* or indirectly, and the latter is the easier 
course. Clearly if @ be the mean value of w for a given class, then 

= %,, 
or &-Z=2, —&, 
* Let x within the class=%,+.,’, S=sum for classes and 2=sum within class. Then 
SZ (n,,xz,)= SZ {n,%, (%,+2,')} =S (n,%,2)+ S {Zz (n,z,')}, 
but z (n,,2,')=0, 
SZ (n,v%,) = S (n,%,"), 
1, Fy 52 (Mat) — 8 (mg%2) _ Oz5 
N .03,X0, Noz,o, 93,0x 


=0;,/0, a8 before. 
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is the equation to the regression line; therefore 





G-Z %% 4,—z 
» Gy Cx oF, 
Whence it follows that 
z, _ 
‘rc. = Gq estenestaetanesennnenese sanabanney (i), 


or the correlation of a variate with its class-mark is the ratio of the standard 
deviation of the means of class-marks to the standard deviation of the variate. 


It follows from this that in classifying a variable into broad categories the test 
of their efficiency, as far as number and arrangement are concerned, lies in the 
standard deviation of their means not differing widely from the standard deviation 
of the variate itself. 


(2) Clearly O75 = S (Me eP)/N ....ceeesceeceeesseeeenesssesees (ii) 
where S is a sum involving all classes. 


If before classifying into broad categories we have made a quantitative 
determination of these classes on a sample frequency, the values of #,’s can be 
determined. If this has not been done, or cannot be done, we are bound to 
assume some form of frequency distribution. Suppose we take a normal distribu- 
tion, then we know that 


| a ae < 
Nt, = ——— | axe 20°dx= No,z(z, —2,,), 
V2 oz! *s, . 2 
1 “ 2 
where g=—— eS (le)? 
\ 20 


and can be found from Sheppard’s Tables of the ordinates of the normal curve 
as soon as n,, etc. are known, for the z’s are the ordinates at start and finish of the 
sth. class, reduced by the factor N/o,. 


Hence 


N 
ae —zg ¥ 2 
oz = S ts 24° X ont, 


or re0,= J s x (2,,— 2) SER (iii), 


Thus 7,¢, can be found at once from Sheppard’s Tables, when the totals of the 
broad classes are known. 


(3) Let us now suppose a second variate y and assume that the correlation 
of # and y for practical purposes is linear. Then clearly since a given « will have 
a constant class-mark, the correlation of y and C, for a constant x is zero; that is 
to say that the partial correlation coefficient 


p im Sa Vay V2C, 
xr C — oS - 
Try Uy V1 — 7 V1 a M20, 


xy 


Influence of “Broad Categories” on Correlation 


Hence TyC, — Try Tec, =; 
TyC . 
or Try =- PF rs aca cdunce ee tbaiediewetied (iv). 
: TrC. 
2 x 


In other words, to find the true correlation of x and any other variate y, divide 
the correlation of y and the class-mark of «x, by the correlation of # with its class- 
raark. 


From (iv) we can deduce the correction for number of arrays when we find nzy 
the correlation ratio of « and y on the supposition that r,, equals sufficiently 
closely my, i.e. when we “find r by 7 methods.” Let H,g equal the value of 
found when y is finely classified and the mean of an array can be determined for y, 
but the arrays of # are broad classes*. Then 








ne pg is a (v), 
xy . C, V8 (n,#,2)/(Noz) 
= Ayo, ELAS Ce eS Re (vi) 
+ 2 
if 82 (en, — 44) 


if we suppose the distribution normal. 


(4) Now let us consider two variables # and y given by their class-marks C, 
and C,. If we correlate y with any variable w we have at once by (iv) 





TO, at 
'yC,= fy SD Sencsedaine eeoeees S oveneaneneu (viii). 
"yc, 
Substitute this in (iv) and we have+ 
TC, C, 2 
Vey =— Rortere <5) Suen ecuemeevesdete reweke (ix). 


Te, "yCy 


* The primary establishment of equation (v) is due to “Student.” His paper published in this 
number reached me, as I was writing this paper. I had obtained (iv) and (ix) and used them to correct 
contingency, but not to correct 7. 

+ A somewhat different proof of this formula may be obtained as follows: the partial correlation 
ay" C,Cy is clearly zero, for when « and y are constant C,, and C, do not vary. Hence 

, A . a? ; va : , : <Q 
ro,C,( ’ ty) 7BO," 2Cy y Cy "yO," xy Vro,"yc,t"xc,"y c,)=9- 


2 . asm . —_— 4 . 
” "ry="yC,"20,="x0"yCy? 


and if we substitute for7 ,, andr we have 
yC, xCy 


70,0y 1 — ay) — ay ~ May) "y Cy720, = 


r Y 

C,C, 
% s ’__. as before. 
yC,'<xC,, 


or eae 
ry r 











a a ee LS CPIM din 


<a RUERr Etaee T 








A CS AEB os 
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Thus 1/(r,¢,7y¢,) appears to be the corrective factor when we group the 
variates a and y both into broad categories, the ranges of which are of any 
nature. 


We have 


° 


ee og we Meat eee) (x) 
CyC, Vs (n,&2)/N VS (mye) sere eee ecereereseeee ) 


g (i Tet det) 


N oz cy 


Ns 2," Ne Ye? \ 
8 (Fs x8 (7) 
N o,°/ N o,;7/ 


where my is the frequency of the cell in the sth category of « and tth category 
of y. 





and Tay = 


It is obvious that r,, takes its usual value when the classes of « and y become 
very numerous, for both factors in the denominator are then unity. 


In the particular case of normal distributions, on the assumption that the 
product 2% is sufficiently close to %% to be replaced by it: 


P s¢ 
S {ea (25, sci 25.) (2, as a} 


‘ s ( ii) 
Tey = N | SENT kee xii), 
' i 2.) 8 i (2, = uy} 


which admits of fairly ready determination from Sheppard’s Tables. 





(5) An approximate value of 7,¢ can be reached in the following manner, if 
we suppose the range of the broad classes are all equal and given by h. 
Assume a parabola y=at+ bat ca? 


to give by its area the three class frequencies ms_,, %,, %s4, the origin being at the 
start of the n,_, class, Its equation will be found to be 





Weir — Tg t Lng — Mgay — BMg + AWNg_y (H\ | My, — Wg + Ng /@\" +s 
y= gta at -1 sti — ONs + 2Ns (7) + ort - + ‘(j ao RORMEERS 


6h F h h) 2h h 


2h ] 
But Nyt, =| yods = ee (Ns, + 86n, — M54), 
h 24, * 
yar h N44 — My 
or a =ghta . : 


Thus, if x,’ be the distance from the origin to the middle of the sth class, 


Sieg pL pe — My 
, — «@, = — h——_—__ , 
24 Ns 


and if Z, and #, be measured from the mean of the whole population 








120 Influence of “Broad Categories” on Correlation 


and 
he Ns) — N+ h? (M4 — Ngai)” 
Sp Ms.  . Metil 


NgBie = Nye — 2 
ot 7 oe 576 Ns 
. h? h? h? (M4 — NP 
=N:05 — 12 (@—1 Ng—y — Ug44 Ns41) - 12 NU — 12 Ng4y “fF 576 — eo. —. 
8 





Summing for all classes and dividing by No,? 


sg (5 A _ 8 (n5a") Pe... h? ng h? h? g (Mea — Ngi1)* 


No?) No; 1207 126, 57602 Nn ~ 


But by Sheppard’s Theorem, with contact at the tails of the order we are 
supposing : 











Further, if we suppose ¥,_,, Ys, Ysi1, Ysi2 to be the bounding ordinates of the 
classes 1,1, Ms, Ns4, Of the frequency curve, 


—1 5g + 2Ng_ 
( - Nt + = aid 2Me1) = ; (4n5_, a 5Ns + Ngai). 


O_) = Ng_1 — Ysh = Ny — 


Similarly sir = Nsii — Ysroh = b(n, — Sng + 4Mg,,). 
Here a,_, and a,,, are the excesses of n, and n, above rectangles ; and we have 


== , 
Og — Ogi = § (Mya — Mp p. 


Thus finally 


1 (Ms BP\ _ 1 h? i? (51 — M4) hy 
S ( A =l1- 12 oe + iddo, Ta ores (xiv). 


Consider the last term; it may be written 


s ft (Gy = Gea + Bepn = Os) (Oy = Oya + Mota — v} a 
Ns N 


Ys-1 

















Now the numerators are mean differences of what will be, if h be at all small, 
small areas, and these mean differences are divided in the one case by ns and the 
other by V, much larger quantities. Hence the last term is as a rule very much 
smaller than the second, and we have approximately 
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Thus we see how rapidly, if h be small with regard to o,, the value of the 


quantity 
Ng Xs 
«(4 
° N o;2 
and therefore of r,¢_, its square root, approaches unity. 


(6) It is desirable to illustrate the approach of zc, to unity as we increase 
the number of groups. I have therefore worked out this approach for three and 
more symmetrical groups when (a) we use the approximate formula (xv), (b) we 
still suppose the classes to have their ranges equal but the frequencies to be given 
by a normal curve, (c) we assume the frequencies in the classes and not the class 
ranges to be equal, the frequency being supposed normal, (d) we suppose the 
frequencies to increase by 5C°/, at each stage, ie. to be as 1, 3/2, 9/4, 27/8, ete. 

These cases will be sufficient to indicate what sort of frequencies we should 
take for few classes in order te get the highest correlation between variate and 
its class-mark. I shall after these theoretical investigations consider a few actual 
cases of “broad” categories. 


a» “yr Various Gr Y * 
Values of r,¢, for Various Groupings *. 


| Wo. of | Rebnl Borges, A (0) Baca bite 
| No. of | Equal Ranges, Any 7 ERS Equa nereasing 
| Classes Frequency. Equal Ranges, Subfrequencies, Subfrequencies, 


Normal Frequency 


| Formula (xv) Normal Frequency | Normal Frequency 











8 ‘817 (h=2c) "859 ‘891 ‘876 
4 901 (h=1'5c) | ‘915 ‘928 ‘912 
5 938 (h=12c) | ‘943 ‘947 928 
6 | 57 (h=o) 960 959 939 
| 8 | 978 (h="75c) ‘977 ‘972 = 
10 ‘985 (h="6o) | ‘985 ‘979 = 
12 990(h=50) | ‘989 | ‘984 | — 
1h 992 (h=30) ‘992 ‘987 os 
16 994 (h=20) ‘994 ‘990 | —_ 
20 996 (h ="30r) 996 992 | * | 


It is clear from the fourth column that nothing whatever is: gained by 
exaggerating the frequencies of the extreme or tail sections. Further: 


(a) Equal frequencies are better than equal ranges up to about six classes, 
After six classes it is better to take equal ranges. 


‘ 
(b) After six classes the approximate formula (xv) is amply sufficient in the 
case of equal ranges to obtain the value of "20," 


Thus we have the general rule that up to six broad categories it is desirable to 
make the frequencies of those classes approxiiuaiely equal, but beyond this greater 


* The number of classes given in (a) and (b) is based upon the range of 6o covering for such data as 
we usually deal with, i.e. 500 to 1000 cases, the total frequency. 
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exactness will be reached by equal ranges. The differences, however, between 
equal ranges and equal frequencies are after six classes so small—within the value 
of the usual probable error of random sampling—that either method will give 
practically quite good results. 


(7) It is interesting to note the relation of the present method to the usual 
correction for grouping in the value of the correlation, In that method we take 
the groups at their mid-points, and we do not correct the product, but only 
the standard deviations of the two variates by the usual Sheppard’s correction, 
zsth of the sub-rarge squared being subtracted from the raw second moment 
coefficient. Now we have 

Yay =S (Ne XeYt)/Cr Fy, 


and we have to replace 2,, y, by %, and %. 


But we have seen that approximately with equal ranges 


Fh 1, M3 — Neti 
H,= &,— sah ere 
ma Ll, M1 — Ney 
I= Ye- aq eee 
4 (Ms (Z\) _ 1 , (NM (%)" Zu 1 
“a . i ice \ 1 155," . \" ait ae 
Now 
— = ¥ 1 y (Ms—1 — Ne41 
S (ngt Bs Fr) = S (Not Vs Yt) — 34 hS —— Net Yt 
Bs (nea — N44 1 a Ng—1 — Mp4. Ney — Ney 
ar kS _ NstXg> + B76 hkS {ne ge ee on (xvi). 


Consider first the last term*, it contains not only the product of hk, but also the 
product of differences, and is of the form, when we divide by Nozc,, 





“3 h k g Nst x $ (a, sass @1)+3 (A414 — a5) x 4 (a',—a's4) + 4 (@’t41— a’s)| 
36 0,0, |(N Ns nm _ 


which will be of the fourth order of small quantities if h/o, and k/o, be small and 
may be neglected as compared to the square of small quantities. 


* As a matter of fact for Gaussian frequency 


. M1 — Met Me—-1 — M41 = 
S (m4 = er =4AkS (np X,Y) 
8 t 


to our degree of approximation. Thus the fourth term may be written a Wk? S (ny x,y), Which gives 


aaah , ee ae ee 
S (ny X,) = 8 (Ney Y}) (: “PB h? - o k2+ idd iit) 


Fa 
=S (Np, Y;) (a -T i) (1 “= i) ° 
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Next, considering the second summation and summing first ny for all values 
of t, it equals n,¥,, where Y, is mean of all y’s in array corresponding to a,. But 
; : - Sai 
if the regression be linear 7, = “Wy a,. Thus the second sum equals 

«“ 





1 h ; 
24 1 OyFx oe S {(Mg—a — N41) a5} 


h 
TO yFx —; {iS (a%paMsa — Wei Nei) + AS (na) + AS (N641)} 
x 
1 h2 
= 24 TOyoxz os 2N 


as before. But in a term of this order we may put 


~ 24 


Nroyoz= 8S (ngt%e Yt). 


Similarly the third summation 


- ke 
= 12 S (sts Yt) x o. . 
h? ke 











f \ . 
Or, finally, Ss (Nt Ls Wt) = S (sts Yt) (1 Wh 12¢,2 = l2o 4 Gewavased (xvi) bis 
—s y / 
y (Nat Xs ze) (He )( re h? . ke 
Thus : (Wr Gx Cy = o2dy Wd) \t 120,? i) 
Ss Has) 8 (525) 2 to a] oar... =) 
(W o,) \N oC,” ( 12¢,2; ( 1207 
or, since we neglect terms of fourth order, 
(Nt Ze Yt Nst 
S(--— 2 8( = 
aE x) Bs 2 ) =7 (xvii) 
a pgavenbaaieaand 








n ats ne Ye ef Ox Fy 
. (Fe) 8 (Wes) 


the usual value, , and o, having of course to be corrected by Sheppard. 


We see by (xv) that 


oZ=8 F 7?) (1 + = ~:) ERG (xvii) bis, 
but by Sheppard 
ae ee (1 1 =.) 
i ii FF x) \" ~ og)" 
Accordingly 
Oi Paes aC ee Lh? ae 
S \w z) =— (F a) (1 a” =i) pene henlochleteaceed (xviii), 


which enables us with equal small subranges to use the standard deviation of 
means or mid-abscissae at our pleasure. 


(8) Of course it is absurd in practice to push our results to the extreme* of 
two categories only, but theoretically it is not without interest to note the results 
which flow from such an assumption. 


* We have seen that Z,,9,, can only be replaced by %,¥, in (xi), or as a special case, (xii) used, 
provided the subranges are equal and fairly small. 


16—2 

















124 Influence of “Broad Categories” on Correlation 


Let the division be into two categories containing V x (1+4a) and N x $(1-a) 
individuals, and let no assumption be made as to the nature of the frequency. 
Then 


8 é #2) =}$(1+a)#2+$(1-a) &, 





N 
but 4(1+4)%+4(1 —a)#,=0, 
a = =) — 9 l-a 
and therefore S (FF &}) = %,2 i ig >. 
@ /(1—-a) , 
Thus ‘rc, = ri Hi : - EO Ra aca PEA LAE (xix). 


We note therefore that unless we suppose all the frequency in each category 
concentrated at one point—a very rare occurrence—r,¢_ will always be less than 
unity, and therefore in correlating class indices a correction will always have to 
be made. We will consider two cases: (i) when the frequency distribution is a 
rectangle of length J. In this case #, is =/ {4(1 + @)} and o,=1/,/3, we have then 


“2 
20, = VS X41 +6) XE —aG) on. essescescccreeves (Xx) ; 
(ii) the distribution of frequency is supposed to be Gaussian. In this case 
 _ 2aX Ox 
t= 4 qd a a) ? 
2a 


and 0 = 

"Ce" V¥(L—a) x $(1 +4) 
in other words, ’ 0 is the reciprocal of the xX. already tabled in this number of 
Biometrika, p. 27. 


We have the following results : 


| Value of r,¢ 
7 z 
Value of $(1+a) | 
as percentage of 





total Gaussian Rectangular 
Frequency Frequency 
| 
-— ne gee 
GO */. “798 “866 
60 °/, ‘789 “849 
Cad & “759 ‘794 
80 °/, ‘700 693 
90 a: "585 “520 
| 95 */, “473 377 
99 °/, ‘268 172 


| 
It will be seen at once how rapidly the corrective factor 1/r,¢, rises as the 
division of the categories becomes more and more unequal, But in both cases 


very sensible correction is needful and its nature depends on the particular 
frequency. 














Se ee 


+ ee cenenanes Baia ms 





KarL PEARSON 125 


Let us now pass to the consideration of the correlation 7,, as obtained by 
correction from the class index correlation, i.e. 


TC,Cy _ 
Tro, X%y o 
The difficulty is to know how to determine the product S(ng%e Ye) for such a 
slender division as a fourfold table. If we assume it still to be equal to 


Tay = 











S (Nets He) 
we can then find rzy. 
We have S (Ngt%s Fr) = MBF, + NH + Ns¥y Yq + UEYs, 
where we take for our fourfold table: 
Ny Ne M+ Ng | 
N4 N3 NgtN4 
| 
Seer Ea wee ee 
N+ 24 Ne+n3 N | 








But clearly on the above assumption 
=F, C=, W=Yo. Y=Y- 

Thus S (NgeFs Yr) = MB Yi + NY, + Nz LoYy + NZ Fa, 
then since (ny + 24) % + (No + Ns) Z =O, 

(n, + No) Nn 7 (ns 5 Ns) Ys —_ 0, 
we easily deduce 
ey BU, (ny Ns — M52) c 
S (gtFs Yr) = sepia i he a_i | 
SY (ist Yt) (n, + n,) (n, + Ns) een (xxii). 
Similarly we find 


—9 


o _ Met Ny 


o=- = ve 
% mtn 
P. ~eswanesenetecakneemaeetetee (Xx111). 
o Mat Me, 
He. ntn,?* 


And accordingly > i 
S (gts Yt) 
Tne 
C,Cy ox OF 
Ny Nz — NyN j 
ly —<—<————— a ae (xxiv). 
NV (My + Ng) (1, + Ng) (My + Nz) (Mg + Ny) 


This is not really the true value of the class index correlation unless we assume 
9 =ZeYx, which is not generally true. It is really the correlation of the means 


used as class-marks, and as shown long ago by me it is the correlation of errors in 
the means of the two variates, and also equals ¢ the basis of the mean square 
contingency. If we correct to obtain r,, by the factors 


Meg, =6%,/F, and Tyco, =o%,/y, 
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we find 
Ny Nz — NN 
Tey = eee (xxv). 


oo Ys 
nN, Nn. Ne n aoa KS 
(nz + Mz) (ms + M4) a 





Mr Yule has termed (xxiv) the “theoretical value of the correlation*.” 
It will be quite obvious from the present discussion that it is only an approximate 
value even to the class index correlation, and cannot represent r,, at all, unless 
the frequency distributions each consist of two isolated points. Generally 





he SR Seams NyNz — Ng x als ox oy oy 
ss V (my + Ne) (ny + Nz) (Ny + My) (Ms + 4) B Zh Ys 


, iat 

oC Cx GCG. Co ‘; . 

=1o Cc x/ Ghee yal Aa MR Mel a ep a UG apes A Beet tigen (xxvi). 
wie My %e YU Ys 


This corrective value on rg g, is always true, although the above value 
x 





of C,0, is only approximate. 


This is brought out at once by considering what (xxv) reduces to in the case 
of Gaussian frequency. Here 


1 1 h? 
2 P 252 
(n, + n3)%, = N2.0,= No,——e “*, 
: V2ar 
1k 
- ‘ F Ll "3 of 
(ns +24) Y= N20, = No, ——e . 
NV 20r 


where / and & are the distances of the dividing lines from the mean. Hence if 
H and K have their usual meanings (xxv) gives us 


Ny Nz — NyNy 


‘Tay = Wak SEE icodauietncuta weuerbarenee’ 


e is the expression used in my memoir on the correlation of characters not 
. . 1 - . 1< a Pre > 
quantitatively measurablet+. Thus the physical meaning of e is T¢,c, corrected 


(xxvii). 


for the use of class indices, but not for the assumption that %» x Ys may be 
replaced by %x%. We may therefore anticipate that e or, failing Gaussian 
frequency, (xxv), will give a better approximation than rg ¢ to the true value of 
the correlation of # and y. 


I shall discuss elsewhere the possibility of any absolute identity in the 
members of a class; the only thing in which I have personally come across it 
is in theoretical investigations—never in practical—on Mendelian units. When 
we are absolutely ignorant of the nature of the frequency, then I feel sure that 
in dealing with either physical or mental characteristics in living forms, the 
Gaussian distribution will in the long run be found closer than any single type 
of distribution hitherto discussed, and that accordingly even e—to say nothing 





* Introduction to the Theory of Statistics, p. 212. 
+ Phil. Trans. Vol. 195, A, p. 7. 








aan 
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of + by fourfold—would be far safer to use than rg ¢,, or any arbitrary coefficient 
of association, which wholly neglects the question of the type of the frequency 
distribution. In the rare cases jn which the frequency is collected into points— 
absolute homogeneity of category contents—then rg ¢, is the right coefficient 
to use and not Yule’s coefficient of association. 

(9) I propose in this section to consider the relations between 


Nst 


 -+-Is) and 8 (7-4-4) 


when the subranges for each character are small and equal. Let those for x be h, 
and for y be k. Then consider the surface 


ho, ae u 
S (5 -a-Ta) S| 


z=a+bet+cy+da* + ey’, 


a eeenanen » 


and choose a, b, c, d, e so that it gives by its volumes correctly the five class 
frequencies 

Msi, t» Met, MNstit>, Ms,t-1, Ms, t4i- 
Then if we take the origin so that it lies at the mid-point of the n,~ group, 
we find 


1 1 9 1 
= hk Ns, t — 12 (Mga, t + Mp, ra — 2M, ¢) — D4 (M541, t— Mea, t + Ms, t41 — Ns, ta) 








+ 1 © Ne4i, t — Ns-1, t 1 ¥ Ns, toi — Ns, ta 
2h hk 2k hk 
x Ns—i1,t— st | Nsti,t — Ns, t yf? /Ns, t1— Met , Ns, t+i- Mt 
+5 ee ee Cee + 4 
h? hk 2Qhk ke \ hk Qhk 
Further 
+k pth 1 Wk (Ness. t — Nea. t 
Net (Le — 1) = [ xzda = — bh®k = (amen) ; 
t ( st = J —hk | —hth 12 24 h?k 
eek ee ’ 
Or Tica ig BI nn vcicinnnactsoinessvon ve XXVili 
24 fet cree 
REP LM t41— Me, t-2 si 
Similarly Yt =Yr+ 5 k = Poe, BAe oe dase etree (xxix). 
« st 


Now take the product of these, multiply by ny and sum, we have 
i te , (Ns : ee 
S ( Za) =8 (Vr aye) = 24 hs ii (N41, t— Ni, af 
eee 
5 24 kS {ie t+ — Nz, = 
1 1 (Ns, t41— Ne, t +s, t — Ns “) Neri, t — Ns, t+ Ms t— Ns—r,t 
ey fen , tT Me, t— Ms, t— , pt ~ Nea, t 
+7 ( 2N = ( Dee ). 


But if A and k be small the last summation is, precisely as that dealt with on 
p. 122, of the fourth order, and we have 


ae — , (Ng 
Ss ( Zu Yat = § W aeyt) san GEacade tube sacceeaeen (xxx), 
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for the second and third sums obviously vanish if we suppose zero frequencies 
at the boundaries of each row or column. Hence by (xvi) bis we have 


8 (Fe a) = S (natn) x (14 - ye. ) (xxxi) 
N ‘st J st st Us Yt 120, + i963 seen eeees . 


Thus we cannot replace S(ny%u%y) by S(ny%,y,) unless h and k are very small 
st“ st J st sts Jt 
relative to o, and oa, respectively. 
“ y 


S (Nzy xy) 
“Meas, 


But now consider r= , the accurate value. 





We may replace S(nz,xy) by S(ny«,y,), there being no Sheppard’s correction 
for the product moment. Hence 


hte Mia 
S(¥ ws) S (a7 Fa Ta) 
Yay = = by (xxx) 


Tx Ty oz oy 
g Fs ee ue) ( er h? er ke >) by (xxxi) 
N oz Gy ( 120,? 120,? 


s (58: Xs ay 
2 N oz 0, 


= to the same degree of approximation 


h2 ke 
G- ga - a3) 


g (3 ws iy 
"4 
a N Gz oy ire Ee ci os Oe (ixni) 


n, &,. a oe. ~ 
” (3 =A a” (Fe A 


This formula is free from all Sheppard’s corrections, and in form it does not 
involve the equality of the subranges. Hence it seems likely to give moderately 




















good results even for unequal subranges, provided they do not differ very widely, 
and there are not too few of them. It leads as before in the case of normal 
frequency to 


S oe (2,, ei 255) (Zt, 3 a) 


Tay = (N V x 
S \r (2, = 255 : sy, (2, — a 


but it is clear that this cannot be pressed so far as to make only a very few, and 
those very unequal, divisions for each variate. 





(10) The present method of correction seems likely to give good results in 
the case of the method of mean square contingency*, when we may reasonably 
assume the distribution not widely divergent from the Gaussian. 


* A much fuller discussion of all the corrections for contingency has been some years in progress 
and will shortly appear. 
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Illustration I. Thus I take the table for correlation of stature in father and 


son, the subranges 


being each two inches. 


Stature of Father. 


























| > iT >) >) SS >) i) >) Pm) 
| > S > ee S RN > Rs) 
S S 6 ae ae = ~ = ae 
| | i | | | | | Totals 
| ys) re) o | & ‘9 rs) iS 9 9 
| Se > SQ | = Se) cs) S 2 > 
| iS S | © | S | 5 = ~S a 
5 ) See Pee Se ee BS UE Be ae fa 35 | 
2 51 05 | 2°75) 5°75] 95 | 5 0°25 | 0°25 -|— 24 
= 5S] 4 | 775 20 | 41-5 | 17-25) 8-25) 125) — - 100 
2 51 2 10 32 «(73)~=«(| 78°75 1335 | 7-25! 1 — 237°5 
= 5] — | 45 |27-75|65°5 |95 |93-25/31°5 | 4:5 | 1 323 
eo 6 ~ — 6°75 | 38°25 | 61 775 | 395 | 11 2 236 
Pa 5] — | -- | 0°25] 5°75 | 24°75|34°5 | 32°25) 7 0-5 105 
2 es eee 1° | 3 |-625) 675/23. | 661 3 37°5 
vs) — ee ae ee 2°5 1°5 15.5 3% — 8 
s1— | —| - | aa eee 05 | 1 ‘a 3°5 
| \ ; , , 
Totals | 65 | 25 | 95 | 23775 | 2015 257°5 | 127 | 32°5 | 5°> | 1078 





Worked out by the product-moment method and using the Sheppard correction 
for the S.D.’s, rz, = °5259. 


I then proceed to make it into two further tables (i) a 5 x 5, and (ii) a 3 x3 


celled table. 


These are as follows: 


Stature of Father. 























® 7 , Bae Rs te ee Totals 
= wD Toy wD Yo = 
rar i | 
OT ee ae ; z 
i 59° —65°5 42°25 52 23°25 | 8°5 1°5 127°5 
& | 655675 | 44 73 78°75 | 33° 8-25 237°5 
B | 675-695 | 3225 | 65:5 | 95 | 93:25 | 37 323 
S | 695-715 | 675 | 38°25 | 61 77% | 52°5 236 
DQ | 115-795 1:25 | 8-75 | 335 44°75 | 65°75 154 
Totals [1265 | 237-5 |291°5 | 257°5 | 165 1078 
and 
Stature of Father. 
, PEE TETAS GF OL SEL 
g 585—66'5 | 665—68'5 | 685—765 Totals 
RQ . . 
“ | 595675 211°25 | 102 | 51°75 365 
o | 675695 97°75 | 95 130°25 323 
& | 695-795 55 94°5 240°5 390 
= } 
> Totals 364 422°5 1078 


J 





291°5 








In obtaining these tables I was generally guided by an endeavour to roughly 


equalise the frequencies in the totals. 
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If « be the stature of father and y of son, then for the 5 x 5-fold table we have, 
reading from left to right and from top to bottom: 
&,=—1°6780, —°7650, —-0694, +°6191, +1°5440, 
Y= — 16741, —'7617, —-0298, +°6811, 4+1°5795. 


These give 


s (7 =) = 917,091, s (i oe = 608,0509, 
2 y 
Vy C. = ‘9576, ‘y Cy =, ‘9293. 


I now calculated ¢? and found for raw N¢q? the value 314°7514; deducting 
the correction for 5 x 5 cells, this equals 298°7514, whence ¢?=‘277,1349, 


Taking this to represent the correlation of C, and C, we should have 
Vey = °4659/(-9576 x 9293) 
= 5235, 
which is excellently in keeping with the value found from the 9 x 10 table by 
product moment, and not very different from the value ‘5140 found from the 
original 17 x 20 table in inches by the same process. 

I now turn to the 3 x 3-fold table and find the raw Nq* = 219°2194 
or less the correction for 3 x 3 cells = 2152194, whence ¢?=°'199,6469. Thus 
C = 40795. 

For this table reading in same directions as before : 

#,=—1-0823, —-0694, + -9803, 
y,=—1:0804, —-0298, +1-0358, 


whence S (7 =) = ‘773,470, S (y -) = ‘'783,6384., 
po, = "8795, ry c, = 8852. 

Thus taking as before Tay = C/(T20,7y C,)> 

we find Vey = 5240. 


Again in excellent agreement with the 5x5 contingency result and also the 
product-moment result. The evaluations of %, and % are of course based on 
the assumption of a Gaussian distribution for those variates. 


Illustration II. The following table is given by Mr W. H. Gilby in a paper 
in Biometrika (Vol. vit, p. 106) on the “Teacher's Appreciation of General 
Intelligence.” It correlates class of intelligence with character of the clothing 
in boys. By treating the grades of intelligence and clothing as Gaussian variates 
an almost linear regression line was obtained in that paper. Hence it seems a 
favourable case to examine the present corrections upon. Intelligence was 
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measured on my scale (see Biometrika, Vol. VII, p. 93) and clothing was divided 
into five categories, IV and V being ultimately classed together. The better 
intelligence has the higher letter, the worse clothing the higher number. The 
table runs: : 

Intelligence. 


Bic|pi|#e| Fr | a | Tota 











to} I 33 | 48 | 113 | 209 | 194 | 39 | 636 
oe! 41 | 100 | 202 | 255 | 138 | 15 751 
2 | I 39 | 58 | 70 | 61 | 33 4 265 
= |IVandV]} 17 | 13 | 22 | 10 | 10 1 73 
Oo | 

| 





| Totals | 130 | 219 | 407 





535 | 375 | 59 | 1795 








I shall use # for Intelligence and y for Clothing. Mr Gilby found raw 
¢* = 1014, corrected for 4x 6 cells ¢?='0927 and C='291. I find from left to 
right : 

@,=—1°8859, —1:1009, — °4758, + ‘2428, 4+1:1179, +42:2167, 
H =— 10066, + ‘2173, +1:2130, +2°1317. 





Hence 
Ms %s'\ _ .999 396 y (Me Ye _. P 
s ( + =) 933,393, RS ( i = 812,475, 
Tro, = 9661, Ty c, = '9014. 


If we now correct Mr Gilby’s contingency for the class-index correlations of 
« and y we have 
eo . — *Q¢ 
Yay = C/(a 20,7 yC,) = 33842. 
This is very near the values, ‘343 and ‘340, found by converting the table into a 


three-vowed table, classes III, [IV and V of clothing being grouped together and 
a bi-serial » method used (Biometrika, Vol. vit. p. 98). 


I then proceeded to work out on the full table given above the value of r 
as determined by the approximate formula (xvii). This can be conveniently 
arranged thus: 


—1°8859 | —1°1009 | -—-:4758 + *2428 +1°1179 | +2°2167 
} | 
— 1:0066 33 48 113 209 194 | 39 | 
+ ‘2173 41 100 | 202 255 | 138 15 |" 
+1:2130 39 58 | 70 61 | 33 4 
+2°1317 17 13 22 10 10 l 


| | 
+33°2178 | +48°3168 | +113°7458 | —210°3794 | —195°2804 | — 39-2574 
— 89093 | —21°7300 | — 43°8946 | + 55°4115 | + 29°9874 | 4 3°2595 
—47°3070 | —70°3540 | — 84:9100 | + 73°9930 | + 40°0290 | + 4°8520 
—36°2389 | -27°7121 | — 46°8974 | + 21°3170 | + 21°3170 | 4+ 2°1317 
| 


| 
— 59°2374  —71°4793 | — 61°9562 — 59°6579 | —103°9470 | —29°0142 


17—2 
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Pieretons Ss (F ari) = —-240,51548, 


Nozoy 


oe we 51548 
"C,Cy— — 9661 x 9014 


and TC, c, | (Tx 0,7 y Cy 








— ‘2762, 


)=-'3171*. 


In the above table each row is first multiplied by the y on the left and the 


final sign at once given to the product; 


this forms the lower half of the table. 


The columns are then added up with the results given at the foot. The column 
sums are then multiplied by their respective @,’s, shown at the top of the first 
half of the table, regardless of the sign of #,, and the sum divided by 1725 asa 
continuous operation on the calculator, which shows finally *240,51548. 


The value ultimately obtained, ‘32, is somewhat less than the ‘33 of the con- 
tingency result but of the same order for all practical purposes. 


I next reduce my table to a 4 x 4 table as follows: 

















Intelligence. 
B+ | D | E | F+@ 1] Totals 

ep I 81 | 113 | 209 | 233 726 

5 II 141 202 255 | 153 751 

a III 97 | 70 61 | 37 265 

= | IV and V] 30 | 22 10 | 73 

Totals 349 | 407 535 434 1725 

We have &, = — 13934, —°4758, + ‘2428, +4 1:2674, 


jr =—1:0066, +°2178, 





[ Ng," . 

S (—=* | =-868,606, 

(are) seis 
20, = 9320, 


Proceeding as before by formula (xvii) 


+1:2130, +2°1317, 





Ns Yt . 
8( “—) = 812,475, 
Vo,') 


ryo, = 9014. 


we find 


— '228,76485 


Gay = nr 7 
*Y ~ 868,606 x 812.475 


= — ‘3242, 


a result which is again in excellent agreement with the previous ones. 














Lastly I convert my table into a 3 x 3 table: 
Intelligence. 
B+C+D| E | F+@ | Totals | 
ob | ‘ 
sg I 194 | 209 | 233 636 | 
r II 343 | 255 153 751 | 
%S | IN+IV+V 219 71 48 338 
oO 
Totals 756 | 535 | 434 | 1725 | 








* The negative sign shows that worse intelligence is associated with the poorer clothing. 
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I found @=— ‘8995, +°2428, +41°2674, 
Y= —1:0066, +°2173, +1-4110, 
M,Z \ 5 (MY? _. 
and s ( Wo) =-776,941, s( a) =-784,241, 
zo, = 8814, Ty Cc, = 8856. 


Proceeding exactl as before by formula (xvii) we have 


sg = ROHS 
7 784,241 x “776,941 





— 3348. 


For comparison I worked out the above 3x3 table by contingency. The 


contingency corrected for 3 x3 cells is ¢?=°076,0779, and therefore C =-2659, 
hence 


Vey = °2659/(‘8814 x 8856) 
= ‘3406, regardless of sign. 
To sum up our results for this case we have: 


yee Corrected | a 
Bi-Serial » ceaamana Formula (xvii) 
6x2 Table 340 and ‘343 os — 
6x4 Table - "3342 3171 
4x4 Table — — 3242 
3x3 Table — 3406 "3348 


The results are very satisfactory and show that for correlations of this 
magnitude our corrective factors work excellently when we use contingency to 
find rg g,- Formula (xvii) also gives fairly consistent results, but this is partially 
due to the relative smallness of 7. For the smaller r the more nearly Z x He 
may be replaced by %, x 7 as is well shown in the case of the simple 2 x 2 tabie. 

On my scale of Intelligence seven categories are used ; there were in Mr Gilby’s 
schools none of the A or Mentally Defective class, so that he only used six, B—G@. 
We see that there is a high degree of correlation between the actual intelligence 


of a child and its class index. We must have ‘ 
For seven classes r> 97, 
= x “4 r='97, 
, four r='93, 
» three za r='88. 


It is of interest to put beside these the frequencies of my General Health 
Scale and the resulting class index correlations. The scale had six classes: 
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Very Robust, Robust, Normally Healthy, Rather Delicate, Delicate and Very 
Delicate. Applied to 2037 schoolgirls (Biometrika, Vol. 11. p. 166) it gave: 


| 





| 


| ! 




















V. RB. R. N.H. | BD.+D. | V.D. | Total | 
| ic: | b Gee 
109 578°5 | 803°5 478°5 | 67°5 | 2037 | 
. & | 
We have: = — 20343 | —-9033 | +-0936 41-0837 | 42-2993 | 
= | | 
: Ns L," 
leading to S (F =.) ='897,1616, and r,¢ ="9472. 
‘ N oz? ove 
If we divide into four classes, thus: 
E- a l paca 
Vv. R. R. N.H. | R. D.+D.+V.D.| Total | 
S ; ares, (ene 
109 578°5 | 803°5 546 2037 
We have: "| ~ 20342 ~ +9033 | +°0936 41-2253 
Ox } 
(Nn, @ 
leading to Si— 4 =°858,9714, and r,¢ ='9268. 
g K (a7 oe « x C, 


And lastly if we divide into three classes, thus: 





| 
V.R.4+R. | N.H. | BR.D.4+D.4+V.D.| Total 
| | 
Soap taer cma Sars =a 
687°5 803°5 546 2037 | 
We have: “*=| -1-0826 |  +-0936 + 1°2253 | 
Tx | 
. n X € vad ~ 
leading to S ( y =) = ‘801,3866, and r,¢ =°8952. 
LV Oy” x 


Now if we put these results with those on p. 121 together we find: 


j Se 2 ele ee ee a ee ee i 


Number | Gaussian Frequenc Pearson’s Scale ‘ 
—_ 1 y Pearson’s Scale 





of Equai General 
Groups Subfrequencies Intelligence General Health 
| 8 "891 "881 "895 
4 “928 "932 927 
5 ‘947 a "947 
6 "959 966 _ 





It will be noticed that the frequencies differ considerably, although not 
extremely, from equality. Accordingly we may find it sufficient in many cases, 
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although the frequencies are not symmetrical or equal, to state the value of r,¢ 
for the same number of groups with equa! frequencies from the table on p. 121, 
and save the trouble of calculating rz¢ . 


Illustration III, I took the 3 x 3 table for correlation of health in pairs of 
sisters, i.e. 
First Sister. 

















V. R.+R. | N. H. R. D.+D.4+V. D.} Totals | 
| — : 
D V.R.+R 428 172 87°5 687°5 | 
ro | ce: ae oe 172 411 | 220°5 803°5 | 
a | R. D.+D.4+V. D. 87°5 220°5 | 238 546 
S| | | 
N | 

| Totals 687°5 803°5 | 546 j 2037 
| 














and I determined the contingency. The raw value of Nd? = 4249285, and 
after correction for number of cells = 420°9285. Hence 
¢? = '206,6414, 
C='4138. 
Therefore by the value just found for r,¢, for three health classes 
Toy = 1 aa = 5164, 
I then worked out »,,, by aid of the approximate formula 


s (F &. ut) 





ay N oy oy 
7Y 6 (Ns [H\? me (Y\*)’ 
‘ iW ist is iW (%) 


which gives in this case Try = °4940. 

The value as given in my Huxley Lecture* for the mean of two fourfold 
tables with divisions first between Robust and Healthy and then between Delicate 
and Healthy was ‘51. The agreement between the three methods appears 
reasonably good. I compared the last value with the same formula applied to 
the original 5x 5 table (see Biometrika, Vol. UL, p. 166). I found 





8 (Ff St) = 302,7573", 


N ogey 
and Tay = 4880. 
It will be clear that for the type of argument which is generally based on such 


numbers the accordance is very satisfactory. 


* In the case of the 3x3 table this product was °317,2398, which measures the amount of 
correction made by denominator. 





136 Influence of “ Broad Categories” on Correlation 


Illustration IV. I take now an extreme case, namely the Table: 














A. 
i ee A Totals 
Be cae 4983 761 49 5793 
mi bk 1166 | 1661 572 3399 
bs 30 248 | 530 808 
Totals | 6179 | 2670 | 1151 | 10,000 





I call this an extreme case because the table is only 3 x 3, and the frequencies 
and the subranges are very unequal. 


We have &,/o,=—°6172, +°7011, + 1°6871, 
ilo, = —°6750, +°7100, +1°8531. 





ae e : n,@," \ = 694.995 . ie "23: 
Whence I find 8 (Fon ) = 604,222, reg, = 8332, 
(UGE yee ae A446 
s ( ye5)= 712,753, ryo, = 8442. 


I then worked out the mean square contingency and found for crude numbers 
N¢q? = 51466. 


Corrected for number of cells g¢? = 51426. 
Whence C ='5827. 
Therefore Vey = '5827 /('8332 x *8442) = 8284. 


The actual table has been constructed in round numbers as the distribution 
of a Genssian frequency surface for r=°80, the divisions being at 2/o,=+°3 
and +1:2 and y/o,=+°'2 and +1°4, and the correlation is hardly likely to differ 
by a unit from ‘80. Considering the marked inequality of the subfrequencies 
the corrected contingency approaches closely to the correlation—at least the 
approach is sufficient for any argument likely to be drawn from the data in a 
3 x 3-fold table. 

Illustration V. It is profitable to show the amount of error which will be 
introduced by taking a marked case of non-Gaussian frequency. When looking 
out for such cases many years ago, I found one of the most representative 
instances in the case of barometric heights. I take the following table from 
the memoir by Dr Lee and myself*, and I have arranged the 3 x 3-fold table 
so as to give very unequal frequencies. We have thus in this case non-Gaussian 
frequency and very unequal frequencies. We find 

@,=—11479, —-1581, +-9161, 
J=—1:2797, —'2940, +8424, 


n, 2° 
, Y ~ —~ bas 768 rw = 882 
8 ( e =", 768,494, "2c, = 8823, 
»(N Ye 28 O29 217 26 
S (¥ =) = 763,093, ryc, = ‘8736. 


* Phil, Trans. Vol. 190, A, p. 453. 
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Southampton. 
Over 30°15 | 30°15—29°95 Under 29°95 Totals 
3 . 
3 | Over 30:15 545 148°5 | 26°5 720 
zz 80°15—29°85 ode 263°25 340°75 217°5 821°5 
8 Under 29°85... 83°75 288°75 1008 1380°5 
Totals pom 892 | 778 1252 2922 


We have raw N¢? = 1449°52. 
¢* corrected for number of cells = -494,7028, C = 5753. 
i 


si "av ~ °8823 x 8736 
Found by the product-moment method* 
Without Sheppard : r,, = °7752 + ‘0050, with Sheppard: 7, =*7802 + 0049. 

Thus the difference is more than four times the probable error, but is not of a 

nature upon which any sweeping conclusions would be drawn. Indeed some 

might prefer the value drawn from the corrected 3 x 3-fold table, owing to the 
distrust of isolated outlying observations which often widely modify the constants 
of a distribution calculated by product-moment methods. 

To still further test the pliability of the method, I rearranged the data in 
a 3 x3-fold table with markedly unequal frequencies but nearly equal ranges, 























thus: 
Southampton. 
$1:05--80°15 | 30°15—29°25 29°25 —28°35 Totals 
S  30°85—29°85 808-25 733°25 | 0 1541°5 
S | 29°85—28°85 83°75 1223-25 45°5 1352°5 
3 | 28°85—27°85 0 14 14 28 
| ‘ 
ee 892 | 1970°5 | 59° 2922 
Mise via : bass AEN, RIES Sate EM ET 
A more unpromising series of totals is hardly likely to be met with! We 
have 
+°4467, + 2°4133. 


@, = — 11479, 


g ‘Ns =) = °635,4498 
8(Wo3)= Eis fr 


jy =—'7544, +°8044, + 2°6797, 





° — ‘7079 
"20, = 7972. 


s (Fes) = "568,512, Tyo, = "7540. 
y 


Thus we reach some of the lowest values we have come across for r,¢, and 


Ty c, iv the case of threefold divisions. 


* In Phil. Trans. Vol. 190 A, p. 453, the value 7572 was printed in error for *7752. 
18 
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Raw N¢’ = 766'887, corrected ¢? = ‘761,0838 and C = ‘4550, 
ry = °4550/(7972 x *7540) = 7570, 
a result differing only by ‘02 from that of the product-moment method. 


It will be noticed that I have arranged the material so that the subranges 
are equal. If we work the 3 x 3-fold table out, using Sheppard’s correction, we 


find by product-moment 
Yay = 7746, 


which is considerably nearer the mark than the corrected contingency value. 


In this case a poor result is given by the formula: 


+ (Ms Be He | gy (s%, / mu Ye \ 

sre b/s (He ) «8 (Hen) 
the assumption that S(n,yZ%%y.)=S(n¢%, ¥,) being by no means satisfactory for 
(i) very non-Gaussian distributions, or (ii) high correlations, when the subranges 
if equal are very large. For equal ranges with a moderate approach to the 
Gaussian distribution and a correlation not much exceeding 0°5, it gives quite 
good results. Thus if we take the table for stature in Father and Son and 
arrange it in practically equal subranges thus: 


Stature of Father. 














| 58'5—64'5 645—70°5 705—76°5 Totals 
A ! ) 
‘3 Under 67°5 ... 86°25 269 9°75 365 
© 67°5—738°5 ... 39°25 495°5 | 129°25 664 
| 7385-795... 1 | 22 | 26 49 
= Totals Es 126°5 786°5 | 165 1078 
we find: %,=—1°'6780, —'0540, +1°5439, 
jj, =— 10804, +°4387, + 2°1022, 
1 (Ns, P 
s ( Wes) = 697,738, rec, = 8353, 
me Ye i __ +A 
s (Fe ) =-714,644, ryc, = "8454, 
n et Ls Yt 02 . 
Gy aa =) = +260,7392, 
Vey = '5229, 


against ‘5259 by the product-moment methods. 


The same table yields by 3 x 3-fold contingency 
N¢ = 163:909, corrected ¢? = *148,3386, C = *3594, 


whence rz, = ‘5089. 
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Thus the corrected contingency for roughly eyual frequencies and equal sub- 
ranges gives respectively ‘52 and ‘51, but the formula (xvii) gives for equal ranges 
‘52 and for roughly equal frequencies ‘57. It seems to me therefore that corrected 
contingency can be more safely applied than formula (xvii). 


Conclusions. 


A further discussion of the corrections needful when using the method of 
mean square contingency will shortly be published, but the present paper seems 
to indicate that it can in the bulk of cases for 3 x 3-fola, 4 x 3-fold or 4 x 4-fold 
tables be used effectively. I am not aware that any other effective method has 
been proposed for such tables. The assumption of the Gaussian distribution to 
determine the correlation of the variate with its class index need not be made, 
if the material thrown into broad classes has had a sample quantitatively 
determined. But we have seen in this paper that the Gaussian assumption to 
fix the means of the broad classes is in many cases amply sufficient to give good 
results even in non-Gaussian frequencies. Of course a control series to determine 
T,¢,, Where practicable, may be advantageously sought*, but until some better 
method can be suggested, the present seems to me the best available for dealing 
with the correlation of variates classed in a few broad categories. 


With contingency tables of 5 x 5 or 6 x 6, the correcting factors will generally 
be small, but corrections must certainly be made for 4 x 4 and 3 x 3 tables. 


* For example, I took Southampton frequency, and calculated from the original table the means of 
the three groups 892, 1970-5, 59°5, corresponding to the table on p. 137; they ere —1-1006, + -4123, 
and +2°1746, leading to r,, =*8056, as against the Gaussian value *7972. This is for a sensibly 

e x 
non-Gaussian frequency, and a much worse agreement is permissible in actual practice, where the 
nature of the argument rarely turns on correlation differences of less than °05. 
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THE recent paper by Mr Yule entitled “On the Methods of Measuring Associa- 
tion between Two Attributes *” calls for an early reply on two grounds,—first 
because its singularly acrimonious tone is to us wholly inexplicable, not to say 


views are accepted, 


irreparable damage will be done to the growth of modern statistical theory. 





* Journal of the Royal Statistical Society, Vol. uxxv, pp. 579—652. London, 1912, 
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Mr Yule has invented a series of statistical methods which are in no case based 
on a reasoned theory, but which possess the dangerous fascination of very easy and 
ready application, and therefore are at once seized upon as applicable to all sorts 
of problems by those who are without adequate training in statistical theory, 
or without the mathematical knowledge requisite to weigh cautiously their logical 
basis. ‘The methods to which we refer are these : 


? 


(i) the use of the so-called “coefficient of association” to measure the 


relationship of two attributes ; 


(ii) the use of a new coefficient, which Mr Yule terms a “coefficient of 
colligation,” apparently to be used in like cases with the coefficient of association ; 


(iii) the development of a method which first appeared in a paper by the late 
Mr John Gray *; in this method each group of a contingency table is considered 
as a cell of wnit subrange for both variates. This assumption being made, Mr Yule 
calculates the coefficient of correlation by the product-moment method, and on 
the basis of this procedure terms his coefficient the coefficient of correlation and 
uses the customary letter r for it. 


Such a terminology is absolutely unjustifiable and can only confuse the 
uninstructed and undiscerning reader. If the groups were extended from 5 or 8 
to an indefinite number, all Mr Yule would reach by this method would be a 
correlation of ranks, not of variates. As it is, he has obtained a correlation of 
ranks with enormous “ brackets.” It does not seem to have occurred to him that 
the correlation of ranks may be quite different from the true correlation of variates, 
and that in cases where we do know the relationship the correlation of ranks 
is sensibly lower than that of variates. Further, he makes no suggestion that a 
very fundamental correction—that of the variate and class-index correlation—is 
needful before this method could possibly be applied to deduce a limit to the true 
correlation of variates. For these reasons we shall term Mr Yule’s latest method 
of approaching the problem of relationship of attributes the method of pseudo- 
ranks. We are concerned principally therefore in this paper with the so-called 
method of association and the method of pseudo-ranks. In addition we deal 
incidentally with other coefficients and reply to certain criticisms, not to say 
charges, Mr Yule has made against the work of one or both of us. 


(2) History of Subject. 


In view of the misstatements made in the discussion at the Royal Statistical 
Society, with regard to the history of the subject, a few preliminary remarks of an 
historical nature may be fitly made here. Mr Sanger, for example, said that “all 
statisticians before Mr Yule had this passion for the normal curve}.” This statement 


* Gray and Tocher, Journal of Anthrop. Institute, 1900, Vol. xxx. p. 111. 

+ On the difficulty of “‘ brackets” in the correlation of ranks, see Pearson, ‘‘On Further Methods 
of determining Correlation,” Drapers’ Research Memoirs, Dulau and Co., p. 36. 

t Journal Roy. Stat. Soc. Vol. xxv. p. 645. 
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is not only unfair to Perozzo, but to one of us, whom Mr Yule was directly attacking. 
Pearson’s memoir on “Skew Variation in Homogeneous Material” was sent to 
the Royal Society on December 19, 1894. Mr Yule’s memoir on Association 
was presented on October 20,1899. He had previously attended the statistical 
lectures of Pearson and been an assistant to him during a period when nearly 
the whole work of the statistical laboratory turned on non-Gaussian distributions. 
A collection was then made of non-Gaussian material with a view to dealing with 
the correlation surfaces of continuous non-Gaussian variates. Among the material 
especially selected as providing extreme cases were (i) barometric heights (memoir 
presented 1897) and (ii) ages of husband and wife; of the latter the laboratory 
still possesses the contour lines drawn by Mr Yule under Pearson’s direction to 
indicate extreme cases of what the latter has termed skew variation, or wide 
deviation from the normal surface. Another illustration of marked skewness is 
that of the contour lines of the correlation between the numbers of a particular 
suit in two partners’ hands at whist. These latter curves were published by one of 
us in 1894, and at that time* it was distinctly stated that the contour curves for 
ages of husband and wife differed widely from the Gaussian type. It is singular 
that Mr Yule in the paper we are about to discuss should have made use of two of 
the extreme types of non-Gaussian frequency with which he was very familiar when 
he was an assistant in the University College Department of Applied Mathematics, 
and yet have allowed such a statement as that of Mr Sanger’s to remain uncon- 
tradicted. The fact is that the promise made in 1894+ to deal with skew 
correlation surfaces only remained unfulfilled because the differential equations to 
the surfaces obtained in that year have so far defied integration. Undoubtedly for 
continuous variates a generalised correlation surface should be the starting-point 
for attacking the problem of association}. That it has remained unsolved shows 
only the extreme difficulty of the problem; it does not indicate that all 
statisticians before Mr Yule had “this passion for the normal curve.” 


And here we will at once emphasise the fundamental difference between 
Mr Yule and ourselves. Mr Yule, as we will indicate later, does not stop to discuss 
whether his attributes are really continuous or are discrete, or hide under discrete 
terminology true continuous variates. We see under such class-indices as ‘death’ 
or ‘recovery, ‘employment’ or ‘non-employment’ of mother, only measures of 
continuous variates—which of course are not a priori and necessarily Gaussian. 
Mr Greenwood in the discussion on Mr Yule’s paper referred to the jibe§ about 





* Phil. Trans. Vol. 186 A, p. 411. 

+ Ibid. p. 411. 

+ Such a surface, however, involves 5, 7, or more independent constants, not the three of the 
Gaussian surface, and as the fourfold table has, apart from its total, only three available constants, we 
could not hope to determine such a surface for a fourfold table without some additional knowledge or 
hypothesis, other than conveyed by the table itself, as to the nature of the frequency. 

§ ‘* We are considering,” writes Mr Yule, “‘ simply the performance as against the non-performance 
of the operation of vaccination. Similarly all those who have died of small-pox are equally dead: no 
one of them is more dead or less dead than another, and the dead are quite distinct from the survivors” 
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persons being ‘dead’ or ‘not dead’ and questioned whether Mr Yule was correct 
in treating the variate behind the class-index as discrete and not continuous. 
In the original paper of Pearson “strength to resist smallpox when incurred” 
was stated to be the variate, and all the evidence that has been produced since 
indicates its continuity; in precisely the same way vaccination and non-vacci- 
nation represent degrees of immunity in a continuous variate of which area of 
vaccination as indicated by extent of cicatrix and period since vaccination are 
contributory quantitative factors. Again “employment or non-employment of the 
mother” are not taken by us as signifying the presence or absence of a mere 
‘discrete attribute—for example whether she works in a factory or not—but as a 
iclass-index indicating that employed women, who have not only their home work 
but factory labour also, have on the whole more physical exertion to endure than 
those who are simply housewives. We are really seeking how far the continuous 
variate physical exertion of women affects infant welfare, and this is not a discrete 
variate any more than survival or death of infant is a discrete variate, when you 
view them merely as class-indices of physical fitness to survive in the child. In 
other words, for the great bulk of attributes, to which Mr Yule without analysis of 
their nature applies association, we should assert continuous variation. We hold 
therefore that in the main we are applying fourfold or other class divisions to 
continuous variates. Mr Yule thinks he has freed himself from all consideration 
of what the nature of this continuity may be; we consider his belief wholly 
fallacious. You cannot free yourself from some assumption as to the nature of the 
distribution when you are dealing with the association of attributes. And in 
ignorance of what the true distribution may be, what assumption will help you to 
the most probable result? On the basis of a very large experience of frequency 
curves and surfaces we have no hesitation in saying that up to the present time no 
distribution has been proposed which roundly represeuts experience so effectively 
as the Gaussian frequency. One of the present writers has indicated over and 
over again how it fails, and he has measured the significance of its failure, but has 
always recognised that he must put against this the large percentage of cases in 
which it gives reasonable results, close enough for all practical purposes*. Mr Yule 


(J. R. S. S. Vol. uxxv. p. 612). Who the ‘‘we” are, Mr Yule does not tell us; but suppose ‘‘we” 
started to find the relation between age and place in an examination—say the mathematical tripos— 
should we learn more or less by treating the wranglers as a class-mark with no graduations and age 21 
as a fixed division, or by assuming that the fourfold table: wrangler—not wrangler, minor—not minor, 
really covered continuous variations of age and class-place? Vaccination means vaccinated a week 
ago or forty years ago, a graduated immunity; the dead means a group who not only had no power 
to resist an attack of the given intensity, but in certain of the cases an attack of a far less intensity—it 
covers a class with graduated power of resistance. Mr Yule here as elsewhere is mastered by words, not 
seeking the realities behind classification. 

* See, for example, C. D. Fawcett, Biometrika, Vol. 1. p. 443; W. R. Macdonell, Vol. 1. p. 184, 
Vol. 11. p. 227; Pearson and Lee, Vol. nu. pp. 362—7; R. Pearl, Vol. 1v. p. 40; etc. ete. Compare, 
however, J. F. Tocher, Vol. v. p. 300, who for long series finds a certain amount of deviation from 
normality, generally in the direction of “leptokurtosis” (8 is large) not of much asymmetry (, is small). 
Even in these cases we doubt whether any serious practical error would be introduced by the use of the 
Gaussian distribution, unle:s extreme dichotomies are made, 
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with a bias which may well be called in question has gone out of his way to 
select extreme cases, which had already been indicated by one of us as markedly 
non-Gaussian, but he makes no attempt to measure the wide range of physical 
characters for which the Gaussian is a legitimate practical assumption. Mr Yule 
refers to Dr Macdonell’s memoir as a case in which the applicability of the Gaussian 
fourfold table method was “in the first place adequately tested” before adoption *. 
He leaves his uninitiated reader ignorant of two important facts, (i) that in the 
majority of fourfold classifications there is no possibility of such adequate testing 
because only the fourfold division has been provided, and (ii) the test in this case 
was directly made at the suggestion of Pearson and in his Laboratory to test the 
efficiency of the Gaussian method on ordinary data such as form probably nine-tenths 
of the frequencies which occur in practice. The work on this paper of Macdonell’s 
began almost immediately on the completion of the theoretical memoir of 1900 on 
the fourfold table, and Mr Yule’s statement that the warning of Pearson in the 
fundamental memoir of 1900 that normal correlation was not universal “seems to 
have been forgotten in a few weeks at most}” is, as many others of his statements 
from the historical standpoint, hopelessly inaccurste. Thus the paper on eye- 
colour in man and coat-colour in horses was presented in August, 1899, and 
antedates the presentation of the theoretical paper of February, 1900. The 
“warning” could hardly have been promptly forgotten, for the paper was with- 
drawn and rewritten in order to test the value of the method of association then just 
propounded by Mr Yule, and to develop, when that was found defective, what, it was 
believed, was a better treatment. Mr Yule writes that “ Professor Pearson raised no 
objection then and as far as I know has raised no objection since to my coefficient 
Q; indeed he referred to ‘the extreme elegance and simplicity of Mr Yule’s 
coefficient of association.’” Naturally when one finds a method wholly inadequate 
one does not turn and rend an old pupil and former colleague. What Pearson did 
do was to test Mr Yule’s Q against other similar coefficients and finding it less 
stable than any of them, it was dropped and has never been and never will be 
used in any work done under his supervision. But an interesting point arises 
here, which it is, perhaps, worth mentioning. Endeavouring to find for any 
fourfold division an analogue to Sheppard’s median division formula, i.e. for a 


” 


Jaussian fourfold 








a|b 
e|d 
. wa—b 
‘= Cos 7 --—> =sin = ; 
atte 2a+b 
Pearson hit upon the fact that the fourfold 
Vad | Vbe 
voc | Vad 


* Journ. of R. S. S. Vol. uxxv. p. 630. 
+ Ibid. p. 614. 
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has the same association coefficient as 
alb . 
c|d- 
and that accordingly if the coefficient of association were a valid measure of 
relationship every fourfold could be expressed in a form which led to a Gaussian 
r=sin—= (2) = Q),. 
2 \Wad + Vbe : 
Thus Pearson’s Q, was a direct result of writing the fourfold in the “equalised” 
form 


vad | be 
Vbc | Vad 

which Mr Yule now proposes as a primary virtue of his method as giving all 
classes their “natural” or equal percentages. The new “coefficient of colligation ” 
is thus really an old friend, which under the form Q, did not possess “the funda- 
mentally different properties*” with which Mr Yule credits it, being the direct 
and we venture to think legitimate offspring of the equalised frequency table 
as figured above. 





The next historical point where Mr Yule seems to be at fault—at any rte in 
his criticisms of one of the present writers—is in his interpretation of the word 
correlation. He narrows it down to the significance of correlation coefficient found 
by the product-moment formula, and so obsessed is he by this idea that he applies 
it to a correlation of gross ranks, which is not a correlation of variates at all. 
The word correlation in the statistical as distinguished from the biological sense, 
we believe, was first used by Galton in his memoir of 1889 entitled : “ Co-relations 
and their Measurements, chiefly from Anthropometric Datat,”’ and he gave a 
definition of it which does not involve the conception of the product moment or the 
linearity of regression at all. That definition was extended by one of the present 
writers in a memoir of 1895{—and it runs: “Two organs in the same individual, or 
in a connected pair of individuals, are said to be correlated, when a series of the 
first organ of a definite size being selected, the mean of the sizes of the corresponding 
second organs is found to be a function of the size of the selected first organ. If 
the mean is independent of this size, the organs are said to be non-correlated. 
Correlation is defined mathematically by any constant, or series of constants, which 
determine the above function.” It will be seen that this definition of correlation 
has nothing whatever in it that limits the use of the word ‘correlation’ to the 
coefficient of correlation as found by the product-moment method. Galton himself 
never used the product-moment method to find his “index of correlation.” He 
had generally in view§ the position that the average value of one organ or 


* Journ. of R. S. S. Vol. uxxv. p. 592, footnote. 

+ Royal Soc, Proc. Vol. xuy. p. 135. 

$ Phil. Trans. Vol. 187 (1896) A, p. 257. 

§ ‘‘Two variable organs are said to be co-related when the variation of the one is accompanied on 
the average by more or less variation of the other...’ R. S. Proc. Vol. xiv. p. 135. 
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attribute associated with a given value of a second changes continuously as we 

change that given value. It was only his wide experience of anthropometric data 

which led him to believe that in most cases the function that the mean of one 

organ is for a given value of the second may be adequately represented by a 

straight line. There was nothing either in his own treatment or in the work of 
his followers of the Biometric School, which pinned them down to the use of the 

word correlation for a particular constant found by the product-moment method. 

The “correlation ratio” has in the work of that school just as much significance as 

the “correlation coefficient,” and it is only Mr Yule who proposes to confine the 

use of the word correlation to the narrow sense of straight line regression deter- 

mined by product-moment methods. To the biometrician correlation when it 

ceases to be linear is not determined by the product-moment value of r at all, 

and the grade of correlation may be far higher than the value as determined by 

the coefficient of correlation. In the same way other constants may be found 

defining the relationship of two continuous characters, or measuring their degree of 

dependence. These are equally measures of correlation in our sense of the word. 

Mr Yule’s coefficient of association is not in our sense of the word a measure of 
correlation at all, it shows in no manner how the mean of one attribute for a given 

value of a second attribute varies as we modify this value. It is, as we shall show 

below, impossible to give it in the case of continuous variates any rational 

significance whatever. Where there is no true correlation at all, the size of 
Mr Yule’s Q may be produced solely by a lack of homoscedasticity—of equal varia- 

tion—in the arrays of one variable associated with constant values of the second, 

but in what manner it measures this heteroscedastic property is quite beyond 

interpretation. Mr Yule claims that the nature of the frequency is of no conse- 

quence, he states that the coefficient of association may be applied without any 
general theory of frequency. For us this is not a correct attitude; we admit wide 
deviations from Gaussian distributions, but such cases are not the rule. Mr Yule 
can pick out special instances which are far from Gaussian, such as age distributions, ° 
barometric heights, or heterogeneous mixtures of growing organs like ivy leaf 
lengths. Even for such cases he has not examined in any adequate manner how 
far methods based on Gaussian or other allied considerations do give practical 

results, nor how far even the Gaussian fourfold »—tetrachoric r or 7, we will call it 

for the purposes of this paper—is a more stable and reliable coefficient than those 

suggested by himself. 


He has indeed criticised the application of a tetrachoric 7, to eye-colour data— 
his discussion of the subject will be considered in a separate section of our reply— 
but he does not inform his readers that the one of the present writers concerned in 
the eye-colour treatment fully admitted in June 1907 * “the unsatisfactory approach 
to the Gaussian distribution found in pigmentation tebles” and stated that his 
very knowledge of this point had led him to develop the method of contingency for 
such problems. Yet Mr Yule raises five years later the applicability of the 


* Biometrika, Vol. v. p. 472. 
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normal coefficient, the tetrachoric 7, to pigmentation tables without a line of 
reference to a paper, which must have been perfectly well known to him, for it is 
entitled “A Reply to Certain Criticisms of Mr G. U. Yule.” This may be strategic, 
but at the same time illustrates the peculiar character of Mr Yule’s controversial 


methods. 


(3) On the Boas-Yulean “Theoretical Value” of the Correlation (Pearson's ¢). 


We now come to a very nice point indeed, namely Mr Yule’s “theoretical 
value of the correlation coefficient,” the method introduced into his Textbook 
of Statistics (p. 212) without a word of warning as to when it should be applied. 
Mr Yule now states that: “It would have been thought that anyone reasonably 
acquainted with the theoretical work of the last decade, and especially Professor 
Pearson and his collaborators, would have found no difficulty in the passage in 
question*.” Now what exactly have Pearson and his collaborators done? They 
have applied the product-moment correlation to the presence of 0, 1, or 2 
protogenic wnits in theoretical Mendelian investigations. They have assumed that 
when a character goes by wnits, you may apply the usual product-moment methods. 
But they have objected in toto to the application of such a method to material 
where there was reasonable evidence of continuous variation. Does Mr Yule look 
upon ‘death’ as the addition of one unit to ‘recovery’? Does Mr Yule look upon 
vaccination as the addition of one unit to ‘absence of vaccination’? Does Mr Yule 
look upon ‘mental defect’ as the addition of one unit to normal mentality? In the 
three Mendelian types (RR), (DR), and (DD) there is a progression of one unit at 
each stage in the number of D’s, but what are the units in the cases we have 
mentioned? Or, does Mr Yule suggest that his “theoretical value of the correla- 
tion” is to be confined to those actual true unit additions to which Pearson and his 
collaborators have always confined them? There is not a hint of this in his Text- 
book nor in his present paper. He has indeed carefully refrained hitherto from 
saying what are the characters of the attributes to which it is to be applied. He 
has suggested that “the ordinary theory of correlation, once that theory had 
been freed from any necessary relation to the theory of normal correlation, was 
applicable in its entirety to the 2 x 2-fold tablet.” Mr Yule says that this should 
be “a very obvious matter.” Indeed !—Then apparently ‘vaccination’ and ‘ mental 
defect’ are a quantitative unit more than non-vaccination and normal mentality !— 
But how does this fit with Mr Yule’s other assertion that these are discrete attri- 
butes and suitable for the application of his coefficient of association? Let us 
see how Dr Boas investigates Pearson’s 14, = $f. 


“Correlations of phenomena that cannot be measured but only counted may 
be treated in the following manner: If two events that have the probabilities 
p, and p, are correlated, we may say that those cases in which the event 1 occurs 

* Journal of R. S. S. Vol. xiv. p. 609. 


+ Journal of R. S. S. Vol. xtv. p. 606. 
t Science, May 1, 1909, p. 824. 
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have the probability 1, or a deviation from the normal probability 1—p. Those 
cases in which the event 1 does not occur have the probability 0, or the deviation 
from the average probability of —p,.” 


Pearson at the time read this many times through ; both of the present writers 
have read and re-read it since, and they fail utterly to grasp how an event can 
have at the same time a probability of 1 and of 0 and a normal probability 
of 1—p! Mr Yule says that Professor Pearson failed “to understand what Dr Boas 
was doing*.” We still fail completely to understand what Dr Boas means, or 
how Mr Yule justifies the assertion that Dr Boas was demonstrating a formula 
which applied to two values of a character differing by a unit. What did come 


out of Dr Boas’ investigation of 1909 when it was translated into the fourfold 
table terminology 


a | b | a+b 








ate | b+d | WN 


ad — be 


was "T+ dyatoe+dyatb) * 


a value already known (i) as the correlation 7, between the means, each measured 
in terms of their standard deviations, of two variates of a fourfold Gaussian table+ 
or (ii) as the square root of the mean square contingency of a fourfold table 
without any assumption of a Gaussian distributiont. 


Now it is known that the correlation of errors in two means, 7g, is equal to the 
correlation of deviations, 7,,, in the two variates of which % and % are the means. 
But 17% is not rg, for h=2/o, and k=7%/c, and o, and a, are correlated as well as 
Zand 7. It is therefore clear that if # and y are continuous variables of any kind, 
Tp: 1S not a “ theoretical value of the correlation” of variates. It is a correlation of 
ranks where the ranking consists of only first and second, and is wholly uncorrected 
for class-index. It becomes a true value of the correlation when the two classes 
differ by a unit quantity as in the units of theoretical Mendelism. There is not a 
word of this in Dr Boas’ paper ; he speaks of his formula as applicable when things 
can be counted but not measured. Mr Yule speaks of it as applicable in its 
entirety to the 2x 2-fold table!—When pressed he says it would have been 
thought that no one acquainted with the work of the Biometric School—on 
Mendelism—could fail to understand what his passage signified. We wholly fail 
to understand it now. Is 7x, ie. $, applicable to every fourfold division? If so, 
why does not Mr Yule use it and drop his coefficient of association ?—In truth he 


* Journal of R. S. S. Vol. xiv. p. 608. 
+ Phil. Trans. Vol. 195 A (1900), p. 12. 
+ Drapers’ Company Research Memoirs, ‘On the Theory of Contingency,” 1904, p. 21. 
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cannot do it without confessing himself hopelessly in error!*—If @ or rj, is the 
right thing to use for a fourfold table as Mr Yule now suggests, then his » and his 
Q are hopelessly wrong, for all the selections which do not alter his Q—and the fact 
that it is not altered by selection is according to him one of its great merits—do 
alter, and this to any extent we please, his “theoretical value of the correlation.” 
This very point was emphasised by one of the present writers in a paper recently 
published in Biometrikat but Mr Yule appears wholly to have missed the essential 
features of that criticism. The points were that (i) there was a wide range of 
values of Q—Mr Yule’s coetficient of association—for a surface of the same correla- 
tion ; if it is impossible to compare the values of Q for the same surface divided at 
different places with any intelligible result, what possible comparison can be made 
of Q from one system to a second ?, and (ii) the values of Q and r;, or ¢ are wholly 
different and tend in opposite directions as we change our divisions. Under 
“wholly different” we include liability to be “wholly differently interpreted.” 
Both Q and ¢ range numerically between 0 and 1. Therefore in estimating the 
meaning of Q and @¢ or 7, we have to consider where they stand on this range. 
Mr Yule examining the associations between developmental defects, nerve signs, 
low nutrition and mental dulness finds values ranging from “75 to ‘95 for his 
coefficient @. He comments: “The associations are, however, all high (very high 
comparea with most coefficients of organic correlation with which one has to deal), 
ranging from ‘784 (? “750) to ‘952§.” Elsewhere in the same paper Mr Yule 
speaks of ‘174 as a “very small association,” and a ‘8 to ‘9 association “as very 
high indeed ||.”. We know accordingly what Mr Yule understands by high and low 
association. Indeed if a scale of values is to lie between 0 and 1, those approaching 
0 must be very low and those approaching 1 must be very high. Now Heron 
applied Mr Yule’s or Dr Boas’ “ theoretical coefficient” to precisely the same data 
as those for which Mr Yule had calculated his association Q, and found that it was 
very high. Heron found that for Q=-921 and ‘753, the “theoretical value of r” 
= ‘011 and ‘006 respectively. If both these ways of investigating relationship are 
valid, then ‘011 and ‘006 must on a correlation scale represent a “ very high degree 
of association.” It would be interesting to know how Mr Yule would describe 
$= '95 or what represents a low association, if ‘01 corresponds to a high degree of 
association !—But any one who is familiar with coefficients of correlation—and ¢ 
or 7% is a real coefficient of correlationn—knows that values of ‘01 and under are 
extremely low values and, whatever their probable errors may be, are of no 
significance for purposes of prediction. All Mr Yule can say in reply to Heron’s 
statement that one of Mr Yule’s methods gives very high relationship and the 

* See also pp. 172-4 below, where this point is touched on again. 

+ ‘*The Danger of Certain Formulae suggested as substitutes for the Correlation Coefficient,” 
by David Heron, Vol. vir. p. 109. 

+ Here and at other points of his earlier papers, Mr Yule apparently considers that @ is really 
comparable with the true correlation. 

§ Phil. Trans. Vol. 194 A, p. 300. See also ‘‘high degree of association,” Theory of Statistics, 


p. 34. 
|| Ibid. pp. 289 and 296. 
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other very low relationship between the attributes under investigation, is that 
“these small figures represent perfectly appreciable intensities of the product sum 
correlation r.” But this is an entirely different matter*, the intensity of correla- 
tion on the scale 0 to 1 is quite distinct from its degree of reliability. Mr Yule 
would have termed °95 a very high association had it been deduced from 1000 cases, 
and we term ‘01 a very low correlation even when deduced from a census population. 
Here are the results for which we have calculated the probable errorst. 


* Mr Yule himself says (loc. cit. p. 651) that the statistician must always keep apart the magnitude 
of the association and the reliability of this magnitude. We doubt the truth of this statement, but it 
cuts against his own argument at this point! 

+ The probable error of a coefficient of correlation for any frequency distribution is given by the 
formula : 





674497 {Pa : P40 
Vn \pu?  2paPo2 4P20" 4P02" PuP2 =P1P02 
(Pearson, Drapers’ Research Menzoirs, ‘General Theory of Skew Correlation,” p. 20). 


The whole work therefore of finding o, turns on evaluating p11, P22, Psi, P13, Fa and poo for a 
fourfold table : 


. 4 
p.E. of r="674490,= Pan, Pao, 4 Pos _ Par te | 











a b a+b 
c d |ct+d 
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a+e|b+d| n 





when we suppose concentration at single points of the frequencies. 


We write ad—be=e, (ad - be)/n?=e’, 
(a+b) (c+d) (a+c)(b+d)=q, and g/nt=q’. 
Then we have r=e/ J, 
ica) _ (era atb\ /b+d ate 
ans pony re n n n “Ti se 
b+) (a+c) a+b)(c+d 
ioe wean THOS) 
n° n 
,(b+d)3+(a+e)§ , (c++ (a+b)3 
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b+d , at 
m,=—-, mg =—— , 
n n 
a Ve ee 
= ? ~ my ¥ mg my’ me 
_ (+a? (ate)? , (+d)? (a+b? _ iy _¢ 
—™ ed ate b+d a+b c+d =n(y ). 
kana Pa a'|e2+ (my — my’) (mea — mg’) ; 
Pur € 
Pa _ : + : of (my — my’) (mg — m2’), 


2pPo2 2 249 





Oe Se PR ae 
Apap? * dpoo? ~ an at Sli 
Psi Ps _¥ 


=. =1=q¢q'-6, 


PuP2» PuPo2 ® 





Biometrika 1x 











On Theories of Association 


Relationship between Blindness and Mental Defect 
for baat ast Groups. 











| Age Group age 3: S - aaa 

— ‘0113 + 0030 

| 10— ‘0100 + 0022 
156— -0065 + ‘0017 
20— ‘0061 + 0015 
25— 0046 + 0009 

| 35— “0060 + ‘0010 

| — ‘0053 + 0008 
55— 0059 + ‘0012 

| 65— -0028 + ‘0012 

| 5— ~ 0031 + 0014 

| 85— -0058 + ‘0065 

All ages 5—85 0066 + *0002 

| | 


lames es ee a 


Is it conceivable that, if Mr Yule had approached the problem of the relation- 
ship of blindness and mental defect from the standpoint of “the theoretical value ” 
of r and found that the maximum value of the coefficients obtained was ‘0113, 
there would have been any talk of the high association of the two attributes? If 
this on the Boas-Yulean scale means high association, what language would Mr Yule 
find to describe a Boas-Yulean coefficient of ‘96? Instead of replying to this criticism 
of Heron, Mr Yule states that this series of values confirms his view that the 
association decreases with age! If the reader looks at the diagram below, he will 
see the mean value of the Boas-Yulean coefficient with twice the probable error 
of each age sample set off either side of it. He will note (i) that there is only one 
age 75—80 where the deviation from the average value becomes significant ; 
(ii) that from age 15 to age 60, the polygon is practically horizontal and agrees 
with the mean ; (iii) the “high” values (‘01 order!) occur in childhood, especially 
ages 5—10, where diagnosis of mental defect is doubtful, and the low values at 65 
and onwards (where they are even contradicted by age 85 onwards!), just the ages 
when senile decay and old age cataract may lead the recorder of a census return to 
almost any statement as to mental derangement and blindness. We feel fairly 
confident that the unbiased statistician with this result before him could only con- 


Hence o,2 =: i+s red (my — my’) (mz — mo’) — (Fd" “9 7445 = = (my - - m4’) (mg -my)\ 


{1—r2 + (r + 473) Xu — Fr? (A? + w?)}, 


or P.E. of r= ae te r2+(r+4r5) hu - Sr? (x2 + u2)}4, 


m m m my 
where r=a/ 2 u -/# v and y= | - c ma’ 
mM< ian 


This form agrees with that obtained by Yule and verified by Greenwood, but our deduction of it 
appears to be the natural method and shows its relation to the general formula for the probable error 
of r. 
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clude that the Boas-Yulean method indicated no practically important relationship 
between mental derangement and blindness and that there was no trustworthy sign 
of any modification of this relationship with age. The point is not whether such 
a relationship really exists or not, but that one method advocated by Mr Yule 
shows definite results (high association), although a second, equally strongly 
advocated, fails to give any association of practical value and is thus in direct 
opposition to the first*. Mr Yule has omitted to indicate which method is the 
proper one to use in such cases. Are both right or both wrong? One method 
Mr Yule states may be applied to all 2 x 2-fold tables, the other should be used 
for “discrete” quantities. When is a quantity “discrete”? Mr Yule confuses 
“discreteness” in the class-index—a mere verbalism—with discreteness in the 
attribute classified under it, and this reduces his investigations from the plane 
of practical statistics to the field where we originally placed them, that of 
theoretical logic. 


Diacram I, Diagram showing absence of any relationship of practical value between 
Blindness and Mental Defect. 


Blindness and Mental Defect. 





Average. 


Coefficient of Correlation. 














5 10 15 20 25 35 A5 55 65 75 85 95 
Age. 


(4) On Association and the Boas-Yulean Coefficient. 


Again we reach an interesting point which Mr Yule has failed to elucidate. 
It is best illustrated by an example. Take the following table for lengths of ivy 
leaves on the same spray—one of the examples selected by Mr Yule: 


* See also p. 204 below. 
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A. Length of First Leaf. 
| 
- Under 6:°95mm. | Over 6°95 mm. Totals 
S| 
De. | 
we Gf | Under 14°95mm. ... 6,943 46,673 53,616 
2° @ | Over 14°95 mm. 41 6,343 6,384 
ae | 
_ | | 
bo | 
8 WWE cc as 6,984 53,016 60,000 | ; 
The true correlation of the material is 5672 and the tetrachoric r = *5572 + ‘0058. 
It may be presented in any one of the three forms (i) as A above, (ii) in one of 3 
Mr Yule’s “natural forms *,” e.g. 3 
B. 6,636 4 





1,383 | 8,019 
| 8,019 
| 





1,383 | 6,636 


8,019 | 8,019 | 16,038 


Or, again, as 


C. 6,943 |  4:6673| — 6,947-6673 
| 


410,000 | 6,343 416,343 
e | ; 
416,943 | 6,347°6673 | 423,290°6673 
| 





Now Q is the same = ‘9167, “very high association” for all these forms, and 
a posteriori given Q we should not know whether it had arisen from A, B or C. 
The values of the Boas-Yulean coefficient for these tables are: 


A. $="1183, 
B. $= 6551, 
C. ¢= 0152. 


Which of these values are we to adopt? Are we to use the table as it originally 
is given? Or the “natural” form B? Or the selected form C? How can two 
coefficients ever lead to the same results, when an adjusting process which does 
not modify one, a property claimed as one of its advantages, changes the other to 
any value we please from zero up to Mr Yule’s new coefficient of colligation +t ? 

* See Journal R. Stat. Soc. Vol. uxxv. p. 590. 

+ Taking the usual form of the fourfold : 


a b | a+b 





e | d c+d 
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Let us examine the effect of “adjusting” the tables in Mr Yule’s cases of 
vaccination and death at Sheffield, Leicester and Homerton-Fulham. He asserts 
that it is “natural” to take 50°/, of vaccinated; we fail to understand why 50°/, 
is more or less “natural” than 70°/, or 95°/, or than the percentage which actually 
occurs in the smallpox cases of those towns. Even if he takes 50°/, of vaccinated, 
why it should be “natural” to take 50°/, of deaths also is to us equally mysterious, 
and we believe must be to that juryman “any man of ordinary intelligence” to 
whom Mr Yule appeals. The following table gives the values that would arise 
from different methods of “adjusting” the tables: 


Smallpox—Vaccination and Death. 





Yule’s 


Association Boas-Yulean ¢ 








| @ |w| ol @leo| | 


Percentage of Deaths 




















| 


It is clear that the Boas-Yulean method will give any results whetever between 
zero and those in the (b) column, according to what percentage we choose to take 
in the adjusted table of deaths and vaccinations. We can also change the perfectly 
arbitrary order that Mr Yule has given for the three towns. It appears to us 
that his statement that “it should have been an obvious matter that the 
ordinary theory of correlation, once that theory had been freed from any 
necessary relation to the theory of normal correlation, was applicable in its 


| 


50 °/, 50°/, Actual! Actual Actual| Actual Actual| Actual 
| Percentage of Vaccinations | 50°/, | 50°/,| 70°/, | Actual) 95°/, | 
| 
= SSE 2 Sei SAS HAs 
} 
| Sheffield... soe ese “902 | 6380 | *531 479 “383 | “769 "432 | 5°15/104 
Leicester eee eve *862 | ‘572 | *249 *233 190 “611 "187 | 2°46/10# 
Homerton-Fulham Au 804 504 | 423 409 084 | 662 379 =| 1:22/1044 
| 





modify it by a Yulean selection using the factors J and m into 


la | b la+b 


Ime md m (le +d) 





| 
l(a+me) | b+md _ . 


Q remains unchanged. But ¢ now takes the value 


- 4 ES db \ 
a ot abd. = (ad — be) | ,/ (aa +be+ = + med ) (aa +be+ + lac) . 
a/im (a+ me) (b+ ma) (la + b) (le + a) / m l 








This is a minimum when / and m are indefinitely great ; it is a maximum when 
n= Jab/ Jed, l= /oa/ Jae, 
faa — /be oes ae 
or o=~—__* — =, the coefficient of colligation. 
Jad + /be 
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entirety to the 2 x 2-fold table” is wholly inconsistent with any validity in the 
coefficient of association. 

It is not only that we can give a vast range of values to ¢ for a constant Q, but 
equally we can give Q a whole range of values, starting at the symmetrical table 
value and proceeding up to unity for a constant ¢. Examine for example the 
following series of fourfold tables. The first is the table as it actually occurred 
with Q=°5078, the second is Mr Yule’s “equivalent” symmetrical table with of 
course the same Q. / 











(1) 29600 | 17300 | 46900 (2) 47952 | 27398 | 75350 
37200 66600 | 103800 27398 | 47952 75350 { 
66800 | 83900 | 150700 75350 | 75350 | 150700 | 
Q='5078, o='2542. Q= 5078, = '2728. 
We now proceed to adjust the latter table so that @ remains stationary and 






























































Q rises. 

(3) 45452 | 47103 | 92555 (4) 43052 | 56385 | 99437 
12693 | 45452 | 58145 8211 | 43052 | 51263 
58145 | 92555 | 150700 51263 | 99437 | 150700 

Q= 5511, $= 2728. Q = 6003, $ ='2728. 

(5) 39852 | 66467 | 106319 (6) 37952 | 71808 | 109760 
4529 | 39852 | 44381 2988 | 37952 | 40940 
44381 | 106319 | 150700 40940 | 109760 | 150700 

Q="6813,  =-2728. Q="7407, p= 2728, 

(7) 35852 | 77349 | 113201 (8) 33552 | 83090 | 116642 
1647 | 35852 37499 506 | 33552 | 34058 
37499 | 113201 | 150700 34058 | 116642 | 150700 

Q='8196, $= -2728. Q= 9288, = 2728. 

(9) 32952 | 84541 | 117493 (10) 32552 | 85499 | 118051 

255 | 32952 | 33207 97 | 32552! 32649 
33207 | 117493 | 150700 32649 | 118051 | 150700 

| | 
Q= 9611, = ‘2728. Q='9845,  =°2728. 
(11) 32352 | 85974 | 118326 (12) 32327 | 86035 | 118362 
22 | 32352 | 32374 11 32327 | 32338 
32374 | 118326 | 150700 32338 | 118362 | 150700 
Q=-9964, $ ='2728. Q=9982, go ='2728. 
(13) 32302 | 86095 | 118397 (14) 32298 | 86104 | 118402 
1 | 32302 | 32303 0 | 32298 | 32298 
32303 | 118397 | 150700 32298 | 118402 | 150700 


Q='9998, ='2728. Q=1:0000, ¢ = ‘2728. 
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The two series of results just considered show us that with ¢ constant we can 
make Q rise to unity and with Q constant we can make ¢ fall to zero. It is there- 
fore always possible to determine or to find in actual practice series of tables in 
which an ascending order of @ is accompanied by a descending order of g. That 
is to say the two coefficients will flatly contradict each other. There is no basis 
whatever for Mr Yule’s assertion that in Sheffield the correlation of vaccination 
and recovery is highest, while in Homerton and Fulham it is lowest. With the 
same three values of Q that he gives, we can make any order of the ¢’s we please. 
In the actual tables the relative order is not that given by Mr Yule; the order of 
tetrachoric r agrees with that of @; so does the coefficient of mean square con- 
tingency and further the probability of independence, ie. the probability that 
death is independent of vaccination. These are given as ™, C, and P in the 
last three columns of the table on p. 173, and they entirely reverse Mr Yule’s 
judgment. P is of course in a different class to any other of the coefficients, 
but we return to this point later. 


Mr Yule states that any table may be reduced without modification of the 
association to its equalised form; for example the tables 











998,667 | 666 | 999,333 and 300,000 | 200,000] 500,000 
666 | 1 | 667 200,000 | 300,000} 500,000 
999,333 | 667 |1,000,000 500,000 | 500,000] 1,000,000 





are “equivalent,” both have Q =‘385, but the inferences to be drawn from the two 
tables had they originated independently are quite different. In the first case the 
probable error of @=°288, and the result is not definitely significant. In the 
second case the probable error is ‘001 and there is no doubt of the significance. 
If Qo be the Q of an original table ar1 Q, of the equalised table, then we have 
Mr Yule’s vaccination data : 


| P.I of Q p. E. of Qe | 
Ps! 5 
. | | 
Sheffield = a ‘007 ‘003 
Leicester “065 022 


Homertor. and Fulham 007 “005 
| 
\ | 


Mr Yule has not entered into this question of the probable errors of the series 
of his modified tables. The statistician, however, whose long experiencé enables 
him closely to associate given types of tables with given degrees of reliability, 
is largely deceived when an “equivalent table” is presented to him of which the 
probable error as it stands may be } to ;4, or less of the true probable error of 
the coefficient given. 


We have shown in this section of our discussion that the two coefficients Q and 
¢ cannot both be valid. Mr Yule nowhere adequately assigns the type of cases to 
which the one or the other should be applied. He tells us that he hopes the 
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“ comedy of errors” has now ended, that comedy consisting in our overlooking the 
“fact” that it is “a very obvious matter” that ¢ is applicable in its entirety to the 
2x 2-fold table. That applicability is the very point to which Pearson took ex- 
ception in Dr Boas’ use of ¢. Were it a fact, Mr Yule might throw both his Q 
and his w overboard, for what more is required than a method applicable in its 
entirety to a 2 x 2-fold table? And the ¢ method at all points contradicts the 
Yulean results. But it is not true, for at best it would be a method of ranks, and 
the correlation of ranks is never a correlation of variates unless the ranked 
quantities proceed by absolute units of variation—as for example in the theoretical 
Mendelian case to which Pearson perfectly legitimately applied it—or in counting 
the teeth on the carapace of a prawn or the veins on a leaf. 

Mr Yule seems to consider that the inoculation with an antitoxin is equivalent 
to the addition of a unit of something to the individual; we consider that this is 
wholly erroneous. To begin with, the dosage is not uniform, its repetition does not 
always occur at the same interval and the number of doses is not always the same ; 
further the interval between the onset of the disease and first inoculation is by no 
means the same; lastly, apart from the resistance of the individual patients to the 
disease, the curative effect of the treatment depends on the relation of the antitoxin 
administered to the physiological individuality of the patient. It is idle therefore 
to consider this varying complex as a quantity undifferentiated from individual to 
individual. The group treated with antitoxin is not made up of identical indi- 
viduals but of a number of persons with increased power of resistance to the disease, 
which may vary from the case of a person who has gained nothing by it to that of 
a person who has immensely increased his power of recovery. In precisely the 
same way those who have not been treated can by no means be grouped into 
a single quantitative class; it may be doubted, indeed, whether recuperative 
power when disease is incurred is really divided sharply by a line like treatment 
or non-treatment with antitoxin. It may be only the sharpest division we can 
take under the circumstances, and in our ignorance of the nature of the distribution 
a tetrachoric 7, may be as effective a measure of association as Mr Yule’s Q. At 
any rate no man of “ordinary intelligence” would believe that perfect association 
existed between treatment and recovery because out of 23 persons treated none 
died, while out of 977 not treated six died, yet this would be the result provided by 
Mr Yule’s coefficient of association! Clearly if only 0°6°/, died without treatment, 
we should not expect any to die in a sample of 23 whether treated or not treated. 
The vanishing of association for a zero quadrant is a patent fallacy *. 

If the problem were presented to us as Mr Yule states it, ie. the evaluation of 
a new treatment, we should certainly not use his Q for solving the problem. We 
should probably to-day not use a tetrachoric 7, except as a control. We should 
most likely use ¢, although certainly not in the sense of a correlation of four points. 
We should question how far death and recovery are independent of treatment or non- 
treatment ; that is to say, we should ask what is the probability that recovery 


* This topic is of such importance that we have discussed it more at length in Appendix I. 











oo Same ae ett a. 
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was independent of treatment. If n be the total number of observations, then, if 
ng? = x be calculated, the chance of independence is at once given by finding the 
P corresponding to x? from the tables for “goodness of fit” in the case of n= 4. 
The column of P in the table of p. 173 shows us that the Homerton-Fulham data 
stand at the head of the series in this respect. This is of course in great part due 
to the larger numbers dealt with, but obviously in such a question as the effective- 
ness of treatment weight must be given to numbers. 


If we turn to the problem of 7 and C, considered merely as coefficients of 
association, we must examine what Mr Yule has laid down as the desiderata of such 








a coefficient. The fourfold table being - ) a he has assumed that the coefficient 
must range from —1 to + 1, and that if any one of the cells be empty the coefficient 
must be +1 or —1. He then guesses a formula an as or another Vad — Vbe 

ad + be Vad + Vbe 


out of the many thousands which can be invented*. 


But is either of the assumptions above really necessary? Why should the 
association be perfect when b=0? Why should it be perfect even if both b and c 
are zero? Let us toss a shilling and a penny and record heads or tails of both. 











Shilling. 
| Head | Tail | Totals 
| L —_+— | 
| He | 
ss, | Head | 1 | 0 | 1 
gS | | 
Si aeetemacies -——— 
®o | 
a4 Tail 2 1 3 
Ree me 
| Totals | 3 | 1 4 





We do it four times and the result is as above, on the whole not a very improbable 
result. But according to Mr Yule there is absolute association, and since the 
probable error according to him is zero, the result is absolutely reliable. Clear}y 


* See Pearson, Phil. Trans. Vol. 195 A, p. 15. If ¢(z) be a function which vanishes with z, 
then any form 


{ (ad) — o (be) }/{p (ad) + (be) } 


satisfies Yule’s requirements. Or, we can take a form 


1 

1-« % (« * Kk 
““T+n o(@) 

if @(x) be finite for x=o, where x=(bc)/(ad). Clearly for the range 0 to © of values of x, Q, ranges 
from +1 to —1 and satisfies Mr Yule’s conditions if ¢(#) is > (x), but by an arbitrary choice of ¢ we 
can get any form of Q-curve we please. No curve of real significance can be obtained, i.e. no reason- 
able value of an association coefficient by the simple condition of fixing three of its values without 
other hypothesis ! 





Biometrika 1x 23 
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@ would be better than this, for it is equal to ‘33 with a probable error of ‘15 
which makes it for practical purposes unreliable. Or again, we repeat our experi- 


ment and find 
Shilling. 





Head | Tail Totals 








ees oo 

Ss, | H i 2 

=| | pe a eat 

a Stes é 

o 

| Tail “he ee 2 
0G es SIs Pant tn 
| | 
\ Betele ... | 2 3 4 


Here Q is again unity or there is absolute association, and the probable 
error is zero or this association according to Mr Yule is also absolutely reliable. 
Further ¢=1, and its apparent probable error is zero, but this is only apparent 
because the calculation of the probable error (as indeed of that of Mr Yule’s Q, 
although he has not noticed it) is incorrect for such a case. 


These simple illustrations seem to indicate that there is nothing in the nature of 
things which necessarily demands that the association shall be unity when either 
b or ¢ or both are zero. On the other hand the probability P (as derived from ¢, 
the mean square contingency) that the heads and tails of shilling and penny are 
independent is more than ‘90, and in the second case more than ‘25. It seems to 
us therefore that this mean square contingency method which gives reasonably 
satisfactory results, where the Boas-Yulean goes hopelessly astray, is far more 
likely to be preferable in the case of medical treatment to which Mr Yule proposes 
to apply his coefficient. 


There is another point also to be considered. Why should the range of a 


good coefficient of association lie for any given number of cells between +1 and 
—1? Let us examine the following table : 























(A) | a | oop. @ ie | f | g / aa ee J Totals | 
a 40 0 a a, 0 0 oO |} - 0 i 40 | 
B () 0 | 200 | O ) 10 | 110 | 10 BO | + ¢ 360 | 
y 0 0 o | 2 0 | 10 | 440 oO 1190 | © 570 | 

| 8 0 0 0 0 O | 10 20 i) 0 i) 30 
tte 0 0 0 20 10 0} Oo 0 0 0 30 | 
og 0 |} 120 | O | 440 10 | O | O 0 0 0 570 
ee f) 30 | 10 | 110 10 0 0 | 200 a 360 | 
| @ 0 0] oO 0 0 | 0 | 0 0 | 0 | 40 40 
| Totals] 40 | 150 | 210 | 570 | 30 | 30 | 570 | 210 | 150 | 40 | 2000 

i | | 








If we treat each sub-range here as unity we find the correlation negative and 
equal to —1120. If therefore we assume this to be the ultimate distribution, this 
is the correlation coefficient. 
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. Now combine 6 and c,h and i. Proceeding in the same way to find the Yulean 
pseudo-ranks 7, we have now for the following table : 





























(B) a jbte| d |e | fil g | a+ | j | Totals 
a 40 0 0 0 | 0 0 0 0 40 

B 0 }|200 | 0 | 0 | 10 | 110 | 40 | © | 360 

y 0} 0! 0 | o| 10 | 440 |190 | 0 | 570 

3 0} Oo] 0; oO | 10 | | 0 | of 30 

€ o| 0/2]! of of o| of 2 

¢ o |19 |44 | 10! o | o | 0 | © | 570 

° 0 | 40/110 | 10 | 0 | 0 |20 0 | 360 

r) 0; o}| o| of] of of} 0 | a] 40 

Totals} 40 | 360 | 570 | 30 | 30 | 570 | 360 | 40 | 2000 














the value is positive and equal to + ‘0050. 
Now club d and e, f and g, y and 6, ¢ and ¢ together, and we have 
(C) Peis 





a | b+e d+e | f+9 | h+i | j Totals 





a 40 0 o| @] 8 0 40 | 

B 0 | 200 0 | 120 | 40 o | 360 | 

| y+8 0 0 0 | 480 | 120 0 | 600 | 
e+e 0 | 120 | 480 0 0 0 | 600 
9 0 | 40 | 120 0 | 200 0 | 360 
6 0 0 0 0 o | 40 40 











| 

| | 

| 

| Totals | 40 | 360 | 600 | 600 | 360 | 40 | 2000 


The Yulean pseudo-ranks r is now + ‘2562. 


Combine a and b + c,h +7 and j, a and 8, » and @, and we find 

















(D) | atbte | d+e| f+g | ati Totals | 
| atB | 240 0 |120 , 40 400 
| y+é 0 | @O | 480 | 120 600 
e+¢ 120 |48 | 0 | Oo 600 
| nt+é 40 {120 } O 240 400 
| Totals | 400 | 600 | 600 | 400 | 2000 | 


The Yulean now drops down to +°1429! 


Combine d+e and f+g, y+ 6 and e+ € to give 











{ | . . 
(E) | at+b+e | d+e+f+g h+ity Totals 
| I 
a+ 240 | 120 | 40 400 
ytbt+et+¢ 120 | 960 | 120 1200 
| 9+ 40 120 =| 240 400 
Totals 400 | 1200 | 400 2000 








The Yulean is now +°5000. 
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Combine b+¢,d+e, f+g, h+i together; B, y+é €+€, 7, and we deduce 


4 























| a | b+etdtet+f+gthti | Jj Totals 
a 40 0 0 40 
| Btytd+et+f+n 0 1920 0 1920 
) 0 0 40 40 
Totals 40 | 1920 | 40 | 2000 | 
| age" | Se) bee | 








The Yulean has now reached perfect correlation, or r= + 10000. 


Combine a, b, c, d, e together and f, g, h, i, 7; a, 8, y, 8 and e¢, & », 8, and we 











have 
(G) | at+b+c+d+t+e f+gtht+i+j | Totals | 
a+B+y+6 240 760 1000 
e+¢+n+6 760 240 1000 
| Totals 1000 1000 2000 











The Yulean is now negative and — ‘5200. 


But if we write the table as a fourfold thus: 





(H) 











at+b+c+dt+et+f+gthti | j Totals 
| at+B+yt+8+e+ f+ 1960 | 0 1960 
6 0 40 40 
| Totals 1960 | 40 2000 | 











the Yulean would be again unity and mark perfect correlation of a positive kind. 
Mr Yule’s coefficient of association would also be positive and mark perfect 
association. 


Now it is not open to Mr Yule to turn round and assert that such tables are 
extremely “nlikely in practical statistics, first because his condemnation of the 
coefficient of contingency is based solely on the creation of an artificial table 
in exactly the same way, and secondly because he asserts that once we dismiss 
the idea of Gaussian frequency the method of correlating ranks with big ‘ brackets’ 
becomes applicable. Our tables bring out, however, three important points: 
(i) that two variates with an actual correlation of —*112 may exhibit any corre- 
lation between — ‘52 and +1:00 when treated by the Yulean process of pseudo- 
ranks ; (ii) that Mr Yule’s coefficient of association may cover under the heading 
‘perfect association’ almost any value of the real relationship, it is merely an 
association of common names, i.e. class-indices, and not of the real variate beneath 
these class-indices; and (iii) that the assumption that a fitting measure of 
a | 
O| 


oO 





association should give unity for a fourfold table of the form 


is by no 


Q 














ob ipa 
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aa 
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means obvious. It is conceivable that a better measure of association would give 
a limit below unity for such a case, while providing a limit nearer and nearer to 
unity as the information given-with regard to what really occurs inside the broad 
categories is more and more complete. That is to say, a desirable coefficient of 
association would be one which would always lie numerically between 0 and 1, 
but which would not take the absolute value 1, unless far more detailed infor- 
mation were provided than is given in the statement of such a table as (H). 
From this standpoint we see at once how idle is Mr Yule’s criticism of the co- 
efficient of contingency. He suggests that it is invalid because (i) it has an upper 
limit less than unity, when the contingency table has a limited number of cells, 
and (ii) its value rises when you increase the number of ceils. It is less than unity 
in the first case, because we are ignorant of what may happen when we analyse 
the contents of the big cells; it increases in the second case because we have 
additional knowledge. It only becomes unity when one character A is absolutely 
fixed by a second B, i.e. when A is a function of B*. The coefficient of con- 
tingency is a valid measure of association, whether the table be fourfold or 
x x n'-fold. It presents far fewer logical anomalies than Mr Yule’s Q or the Boas- 
Yulean ¢, and it readily admits of our calculating, what for many cases is essential, 
the probability that the two attributes are independent. 


But Mr Yule dismisses the coefficient with (a) a quite unreasoned criticism 
that it increases in value as the number of cells increases, and (6) an illustration 
that it is not equal to the coefficient of correlation for one particular table of 
heterogencous material, i.e. for a surface of zero correlation with a cock’s comb 
of absolute correlation erected along its diagonal. Nobody, as far as we are aware, 
ever asserted it would be. The assertions made with regard to the coefficient of 
mean square contingency may be summed up as follows: (i) for any frequency 
distribution the coefficient of contingency is a reasonable measure of the extent 
of the deviation of the attributes from absolute independence, and (ii) for such 
frequency surfaces in homogeneous material as occur in actual practice the co- 
efficient of contingency, if the proper corrections are made}, gives a value close to 
the coefficient of correlation, whether we divide the table up into 3 x 3-fold or 
8 x 8-fold groupings. The skewness of the distribution—its deviation from Gaussian 
frequency—is not a very disturbing factor, as we shall show in the sequel. When we 
take material which has—if there be an indefinite number of cells—an indefinitely 
great improbability of independence, ie. material for which C,=1, we shall not 


* See Pearson, Grammar of Science, 3rd ed. p. 162. 
+ The probable error of a coefficient of contingency C2 for a fourfold table is 
1 9 4 
67449 1 {1 — 202+ Cs (1-02), — 30.2 (V2 +} iz 
Jn (1- €,2)2 
where \ and uw have the same values as on p. 170. It does not become zero when the Boas-Yulean @ 
is equal to unity, unless \=~=0. 
+ These corrections have been several times referred to (see Grammar of Science, ed. 1911, ftn. 
p. 163) and have been in use in the Laboratory, but the further memoir on Contingency which has been 
for some time in hand has been delayed owing to pressure of other work. It will shortly be issued and 
deal more fully with the corrections merely stated in this paper. 
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reduce this infinite improbability by diluting it with a finite amount of non- 
correlated material; and this approach of C, to unity is all that Mr Yule’s artificial 
cock’s comb surface illustrates. If A is an infinitely improbable event, then we 
shall not lessen the improbability of the whole by combining the event A with B 
which has a zero improbability * ! 


Mr Yule cannot determine the efficiency of contingency methods by simply 
asserting that the value of the mean square contingency depends on the number of 
cells; it naturally alters with our increased knowledge, but this change may mark 
either an increase or a decrease according to the manner in which the material in 
the few cells is redistributed in their component cells. We assert that with a 
3 x 3-fold table you cannot get further than the contingency, and that further pro- 
gress can only be made by some other assumption as to the frequency distribution 
of the variates. With that assumption we think we shall be able to demonstrate in 
the course of this paper to the unprejudiced reader that the coefficient of contingency 
properly handled is, perhaps, the most powerful instrument of modern statistical 
theory. The assumption we make is that for correcting the results obtained by con- 
tingency, so that coefficients found for 3 x 3-fold, 4x 4-fold, 5 x 5-fold, ..., 8 x 8-fold 
tables may give practically identical results, it is sufficient to deduce the required 
corrections by using a Gaussian hypothesis to determine certain means. The method 
gives excellent results for the bulk of the distributions which occur in our wide 
experience of statistical work. If we can show that it gives good results for the 
extremely skew cases which Mr Yule has gone out of his way to cite, our point 
will be proved. Since the full development of the contingency method, fourfold 
tables have not been used by the Biometric School, except as controls, where 
contingency tables could be formed on the given data, But the statement that 
contingency was developed in order to overcome the difficulties of the fourfold 
table methods is directly disregarded by Mr Yule when he turns to our pigmenta- 

* A is the probability that in examining two absolutely independent variates, n cells shall be 
occupied and n?—~n cells empty when we make n infinite. B is the probability that the n? cells shall 
each have their theoretically independent contents. No combination of these two events can give less 
than an indefinitely great improbability, i.e. Cp=1. But we anticipate that if Mr Yule had not raised 
his cock’s comb at such a conspicuous angle to the rest of his surface that its heterogeneity would 
be readily visible to the trained statistician, there would be no very serious error introduced by 
applying the mean square coefficient of contingency even to moderately heterogeneous material. We 
have rot had leisure to investigate the matter closely, but if we superpose two Gaussian frequency- 
surfaces with identical means, with the same standard deviations for both variables, and with cor- 
relations 7; and r., then the true correlation by product moment is 

p=pri + re, 
where pN and qN are the total frequencies of the two components and p +q=1. 

On the other hand 

Pee »/ : pr(L+r72) —2prire (1+ rg) +7272 (1+7172) 2 

“Mp? (1+ 7p 172) — 2pryre (ry +72) + ry? re? (1+ 1p 72) + (1-112) (1 — ro?) (1 - 7) 172) ° 
If 7; =1, 72=0, as we know C)=1, while p=p. But if 7; =-2, r2=°7, with p=°3, q="7, the mixture pro- 
portions of Mr Yule’s illustration, then p=°55 and C.=-59. If r;=°5, re='7, p="3, q="7, then p=-64 
and C,='65. Again if r)=-7, re="3, p= “4, q="6, then p=-46 and ©.="49. Thus it does not appear 
that small amounts of heterogeneity, not detectable on a study of the table, are likely to give very mis- 
leading values when C, is taken as a practical measure of p. 
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tion data*, The recognition, however, that the fourfold table may give discordant 
results—a recognition made by the Biometric School within four years of the 
publication of the pigmentation investigations for eye-colour in man and coat- 
colour in horses—does not dismiss the fourfold table from practical statistics, but 
only from that portion of it where multiple contingency tables are available. 
Given a fourfold classification alone, how is it to be treated? We reply unhesi- 
tatingly that in the great bulk of cases the use of tetrachoric r; is the best treat- 
ment. We base this on the experience that where nothing is known the Gaussian 
is far more likely to describe approximately the frequency than any other hypothesis. 
Even taken as a mere coefficient of association, tetrachoric 7; is better than 
Mr Yule’s Q or the Boas-Yulean ¢, except for absolutely discrete units as in 
the purely theoretical Mendelian cases ; and in those cases the correlation of ranks 
is the correlation of variates, as Pearson indicated in his memoir of 1904+. 





(5) On the Surface of Constant Association and on “ Natural” Equalisation. 


As we have indicated, Mr Yule never states adequately the conditions under 
which his coefficients of association and colligation are to be applied. He 
apparently considers the nature of the continuity of his frequency surface, if his 
variates are continuous, to be absolutely immaterial. Now in the case of every 
two continuous variates, whatever their nature, a frequency surface does exist for 
which Mr Yule’s association or colligation is constant wherever the divisions may 
be taken upon which the fourfold table is based. Let ny, be all the first quadrant 
frequency corresponding to the total frequencies p and q of the two variates, where 
p and g are supposed to be absolutely known. If the association coefficient be Q 
and the notation 











a | b a+b 
} 
¢ 1.6 A9a8 dea 
a+c=p, 
ate | b+d N 
we have (1 + Q)/(1 — Q) =ad/be = x, say. 


* In the very same number of Biometrika (Vol. m1. 1904) in which the Huxley Lecture appeared, there 
is a paper on the inheritance of pigmentation in the Greyhound ; it is the work of Pearson’s Laboratory 
and started about the same time as the Huxley Lecture reductions. The following words occur: 
‘* When we first started work on the greyhounds, the method of contingency had not been developed, and' 
accordingly we made tables for the inheritance of melanism and of red pigment and proceeded to find 
the correlations by the fourfold division process” (I. c. p. 252). And again “In order to compare the 
fourfold method with contingency methods, 16-fold tables and 25-fold tables were worked out to 
compare with the fourfold tables adopted for the inheritance of red and black pigment respectively ” 
(p. 253). ‘‘The results deduced by contingency D method are singularly uniform and steady as 
compared with those of the fourfold-table methods, and we believe, if it be adopted generally for 
such pigmentation problems, it will not only free us from any question of pigmentation scale, but 
afford a good result on a not excessive expenditure of calculating energy” (p. 253). It is clear that 
the Laboratory publicly admitted the difficulties of the fourfold-table method two years before Mr Yule 
started to criticise it as applied to pigmentation data! Yet Mr Yule never mentions this fact. 

+ “On a Generalised Theory of Alternative Inheritance with special reference to Mendel’s Laws,” 
Phil. Trans. Vol. 203, p. 53. 
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Then Gd=Nyq, b=q—Ng, C=P—Nq, A=N—p—qt+ ny. 
Therefore (4 — 1) npg — My ((X—- 1) (p +9) +N} + pax =9 


is a quadratic to find n,,. If we take as we always can arrange to do Q positive, 
then x lies between 1 and © or is also positive. The equation for ny, will have 
real roots if 
(x -1)(p +9) + WN}? > 4x (x - 1) pg, 
or, if 
(x-1"(p-—aP + (x- 1) (p- ah + GN —p—g) (x-1) (p+ q+ N? >, 


which is true since xy >1 and 2N>p+gq. Hence, since ypq/(x —1) is always 
positive, the quadratic has always one and only one real positive root. 


ONyg XY — Ny (x — 1) 


dp N +(p+q—2mpq) (x —1)’ 





Further : 


but y is > y— 1, ¢> No, and p> mpq, therefore it follows that both numerator and 
denominator are positive, or Sn,, increases with p. Similarly it increases with q, 
or in subtracting ny, from either npisp,q OF Np, g48q We Shall never reach a negative 
difference. Thus it is always possible to construct a surface for which Q is 
constant for every fourfold division. It seems to us that had Mr Yule realised 
the possibility of this surface and studied it*, he would have known more about 
the real properties of his Q and its bearing on such distributions as occur in 
practice. The fact that in every distribution of continuous variates we come 
across there is no approach to constancy in Q, that it varies continuously and 
almost in a predictable manner shows how very far the surface of constant Q is 
from representing the facts of experience. Still had Mr Yule fitted the best 
surface of constant Q to a known distribution of detailed data, and so ascertained 
his value of Q, he would have given us a coefficient which would have lived in 
statistical practice and theory, and he would have thrown real light on the relation- 
ship of association to correlation. We have not spent time in discussing the 
complicated equation to the surface of constant Q, but we have provided one 
illustration of such a surface. Taking the total frequencies of each eye-colour 
group in Father and Son, only adding 5 and 6 together, we have Table I given 
below for the eye-colour distribution categories; this table would within the 
limits of our decimal places have the same coefficient of association, Q = 0°6, 
wherever we divide it into fourfold tables. For example, taking both divisions 
between 2 and 3 we have the fourfold 


191°55 | 143°45 | 335 


166°45 | 498°55 | 665 Q="59996. 
| 





358 642 1000 

* We suggest that Mr Yule has not studied the matter, for he writes: ‘‘ There is one case and one only 
where @ is independent of the axes chosen, and that is where the variables are strictly independent,” 
Phil. Trans. Vol. 194, p. 278. For the equation to the surface see Appendix III. 
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Or, again, between 6 and 7 vertically and 3 and 4 horizontally 


576°60 | 42-40 | 619 
294-40 | “86-60 | 381 Qu -e0008 
head ? 





871 | 129 1000 


which sufticiently illustrate the feasibility of the construction of the surface. 


But at once the fictitious character of Mr Yule’s idea of association comes to 
light. He tells us that it is “natural” to put a table in the equivalent symmetrical 
form—one for example in which we have 50°/, deaths and 50°/, recoveries, 
50 °/, vaccinated and 50 °/, unvaccinated. And this is possible because his Q does 
not change by such a process. But surely if it is “natural” to have equal 
numbers of deaths and recoveries, it is also “natural” to have equal numbers of 
fathers, and for the matter of that equal numbers of sons, in each eye-class. 
Equal light blue eyes, and equal dark blue eyes, equal light brown eyes, equal 
dark browns and blacks. But the instant we study a table of this kind some 


TABLE I. 


Table of Variates in Futher and Son Eye-Classes for Constant Association 
Coefficient 0°6. 





546 | 7 | 8 Totals 

















1 2 3 4 | | 

| 1 4:08 19°04 6°25 2°60 0-78 | 100 | 0-25 34 

a. 19°54 | 148-89 75°33 32°54 9°68 12°05 | 2:97 301 

| 3 7°41 90°46 | 90°17 52°97 16°86 21:03 | 5°10 284 
5 2°19 28°54 39°43 | 33°19 12°53 16°89 4:23 137 

| 54+6 1°BS 17°12 25°63 | 26°40 11°82 17°83 4°85 105 | 
7 1°03 13°15 19°94 | 23°33 12°18 21°60 6°77 98 | 
8 0-40 4°80 7°25 8°97 5°15 10°60 3°83 41 

Totals | 36 322 | 264 180 69 | 101 28 1000 


remarkable points arise. (i) The selection which is advocated by Mr Yule and which 
keeps his coefficient constant is a perfectly arbitrary one, and (ii) it must be stopped 
at a perfectly artificial limit, namely that at which the arbitrary division has been 
made. Mr Yule does not equalise each sub-group, but if, for example, he divides into 
light and dark eyes between 3 and 4, he multiplies his light blue eyes, his dark 
blue eyes and his greys, not by different factors which would equalise these groups 
with each other, but by the same common factor, so that the sub-groups really 
remain in the same proportions as before the equalising. Surely if it is ‘ natural” 
to equalise light and dark eye totals, it is equally “natural” to equalise the totals 
of light blue and dark blue eyes. If it is “natural” to equalise the vaccinated 
and unvaccinated totals, it is surely for the same reason “natural” to equalise the 
several groups of individuals who have one, two, three vaccination scars, for the 
number of their scars “is dependent to a large degree on a purely arbitrary 
in making 1], 2, or 3 punctures. 


circumstance, the activity of the authorities’ 
It is equally “natural” to equalise the numbers who have lived 5, 10, 20, 30 


. 2 
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or 40 years since their last vaccination, for this “is dependent to a large degree 
on a purely arbitrary circumstance,” the question of whether the population has 
been recently alarmed and revaccinated. The efficiency of vaccination depends 
both on size of cicatrix and on interval since vaccination. Yet if Mr Yule makes 
these equalisations he will wholly change his association coefficient. That coefficient 
admits not of a natural selection, but only of a wholly artificial one, namely, cue 
which alters all sub-classes on one side of an arbitrary dividing line in the same 
ratio and not in their “natural” ratios. Every word Mr Yule has uttered with 
regard to equalisation and its “naturalness” applies equally to classifications in 
3,4 or more groups. Why should the relationship of father to son in eye-colour 
depend upon the number of light blue eyed fathers taken? It is “natural” 
to equalise them with the dark blue. But this is only possible if the multiplying 
of a row or column will not influence the result, and the reader has only to test 
on the above table how Q is changed when he starts such a selection. A few of 
the values we have obtained are Q=°47, °44, '36,°17 and ‘04! The fact is that 
Mr Yule’s is not a general selection, but a perfectly artificial one, which is 
not in the least “natural” when we analyse its effect on the constituents of a 
given class. 

It is as well to stop and inquire what influence this “natural” equalising 
has on contingency tables. Below (Table IIL) is given the table for eye-colour in 
Father and Son reduced to five classes on account of the labour involved. The 
equalising factors are 


For the Rows: For the Columns : 
Vise = 373512, tros= 61949, 
Ys =3-92073, a, = ‘81174, 
ys =T711764, w, =1°00000, 
Yous = 747842, as 44 = 2°38871, 
Yr4s = 635625, rag = 134437, 


and the resulting table is given as Table IIT. 
TABLE II. 
Actual Kye-Colour Table for Father and Son in Five Classes. 
Eye-Colour of Father *. 

















sg | 

= 1+2 3 | 4 5+6 | 7+8 Totals 
DQ | 

‘ | 142 | 194 70 41 9 21 335 
ay ee 83 124 41 13 | 2 284 
=| 4 25 34 55 11 12 137 
= | 5+6 27 12 |} 19 24 23 105 
Oo | 7+8 29 24 | 24 12 50 139 
2 | l | | 

cy | Totals | 358 | 264 | 180 69 | 129 1000 








* See Phil, Trans, Vol. 195 A, p. 138, 
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TABLE III. 


Table for Eye-Colour,in Father and Son equalised, or put into 
Mr Yule’s “ Natural” Groups. 


Eye-Colour of Father. 











d 142 | er 5 | 5+6 | 748 | Totals | 
| 

D 

~ | 142 | 448°88 | 21223 | 153-14 80°30 | 105-45 | 1000 | 
> 3 20160 | 39465 | 160-75 | 121-76 | 121-24 | 1000 
=| 5 11024 | 196-44 | 391-47 | 18702 | 114-83 | 1000 

2 | 546 |] 195-09 | 72:85 | 14209 | 428-73 | 231-24 }| 1000 

S| 748 | 1419 | 12383 | 152°55 | 18219 | 427-24 | 1000 | 
' 

o | | 

| Totals | 1000 1000 1000 | 1000 | 1000 5000 | 

{ = = 














In this case the Q was not constant for all divisions at starting, and therefore 
there is no general standard. But Mr Yule tells us that his method of pseudo- 
ranks is the best known to him for such a table. The Yulean for this table before 
equalisation was “403, after equalisation it is 166! Q can change as much as from 
‘52 to 43. It is only an “unnatural” equalisation which keeps Q constant. 

Let us consider another case of this, taken from the one field to which Mr Yule 
in his present paper ventures to apply his coefficient, that of vaccination *. 


TABLE IV. 
Severity of Small Pox. 





| 




















Haemorrhagic | Confluent | Abundant | Sparse | Sete Totals | 
| veocinated 0—10 0 1 | 6 1 | 12 30 
he xomgereed | 10—25 5 37 | 114 165 | 136 457 
eee — | 25—45 29 155 | 299 268 | 181 932 
vaccination | Ove, 4s 11 35 48 Se 155 
Unvaccinated os 4 61 41 ag 2 115 
cs 49 | 989 | 508 | 484 359 1689 
The equalisation factors are : 
For Rows: For Columns: é 
4, = 61300, x, =11°0816, 
Yy,= ‘3631, = 1°5125, 
Ys= 1496, a,= 1:0000, 
Ys= °6928, z,= 10118, 
Ys = 10697, ' @= 11659, 


reckoning rows and columns from the left-hand top corner. 


* See Biometrika, Vol. vu. p. 257. 
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After equalising the classes of this table we have the following one: 
TABLE V. 
Vaccination Grade and Intensity of Disease. 


Severity of Small Pox. 




















| Haemorrhagic | Confluent Abundant} Sparse | Grane Totals | 
] 
Cactgud. | Se 0 9:27 | 36°78 | 68:19 | 85°76 200 
| "Years since 1 12-25 20°12 20°32 | 41°39 | 60°59 | 57°58 200 
gens en | 25—45 48°08 | 35°07 | 44°74 | 40°54 | 31°57 200 
,_—— Se ae 84°40 | 36°6@ | 33-23 23°11 | 22°60 200 
| Unvaccinated one 47°40 | 98°68 | 43°86 7°57 2°49 200 | 
| Totals ...  ...] 200 | 200 | 200 | 200 — | 200 1000 








Now here are some of the changes that take place in Q, owing to this 
equalisation of the classes : 


TABLE VI. 

Vertical Division Horizontal Division Old Q NewQ | 
fg | 
Confluent-A bundant Or Vaccinated-Unvaccinated | — ‘7220 —°7070 | 

Abundant-Sparse ,.. ae Vaccinated-Unvaccinated | —*8599 — 8945 | 
| Confluent-Abundant a 25—45 and over 45... | —°5714 — "7522 | 
| Confluent-Abundant e 10—25 and 25—45 ... | — ‘6411 — 8163 

Abundant-Sparse ... ae 10—25 and 25—45__... —°4469 — ‘7742 

Haemorrhagic-Confluent... 10—25 and 25—45__.. — 5711 -—'7798 | 

Haemorrhagic-Confiuent... 25—45 and over 45... — 5449 — 5861 | 

Haemorrhagic-Confluent... Vaccinated-Unvaccinated — +1009 — 1371 | 


The range of values found for Q for a single table will be considered later ; 
it is sufficient here to indicate how largely the bulk of them are changed by 
equalisation of the groupings. If further evidence be needed for the radical 
changes which invariably accompany Mr Yule’s conception of a “natural” 
grouping, we may note that the Yulean deduced from pseudo-ranks is before 
equalisation —°3099, but after equalisation —°5375. These two equalisations of 
class frequency show that it is impossible to predict a priori how the relationship 
of the two variates will be changed by the process; in the first the relationship 
was lowered by 59 °/, and in the second case raised by 73°/, of its value as 
estimated by the Yulean*! It is clear that only a fictitious type of equalisation 
has been used by Mr Yule. His independence of selection is only an artificial 
one; sub-groups within his categories are not equalised but only equally selected ; 
and further such selection must neither fall short of nor exceed the arbitrary 
division he has selected for his class. If for example we agree to equalise each 


* Consider as a last illustration the Table VII below. 





The correlation is ‘37. The reader will find 








SSR paseo 
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age group of the vaccinated—surely a “natural” process—then we shall modify 
the coefficient of association obtained by dividing between vaccinated and un- 
vaccinated. If we equalised the totals of vaccinated and unvaccinated, then Q 
would have changed, if the division were taken at an interval of 25 years since 


vaccination. The whole process is quite arbitrary, and we believe wholly without 
validity. 


(6) On the Coefficient of Association and the Assumption of Discrete Variates. 


Mr Yule has asserted that when we free ourselves from any necessary relation 
to the theory of normal correlation, the ordinary theory of correlation is applicable 
in its entirety to the 2 x 2-fold table. Apparently he holds that the same is true 
for his coetticients of association and colligation although these different methods 
lead to diverse and often contradictory results, for the simple reason that selection 
vastly affects one and does not affect the other. He has not considered the 
surface of equal association, nor discussed the cases to which the “ four-point” 
surface, involved in using ¢ as Boas and he propose, can be legitimately applied. 
It is accordingly of interest to see what happens when the coefficient of association 
is applied to various types of frequency surface. 

It is no valid reply to criticisms based upon such an investigation to say that 
such frequency surfaces do not occur in practice. Mr Yule has never entered 
into any discussion of the character of the distributions to which he applies his 
association ; it is sufficient for him that they give a fourfold table, and he makes 
no appeal, as we do, to experience as a basis for the adoption of any coefficient. 
It is therefore possible to test his association against the clear idea of correlation 
on any distribution whatever. 


If Mr Yule replies that this is not fair treatment because the coefficient of 


association applies to discrete quantities, we answer that he has never defined 


it an amusing task to equalise the total frequencies; he will then discover that it takes an interesting 
form, that of an old friend of Mr Yule’s, and the correlation will then be recognised as *50. 


TABLE VII. 


First Variate. 





























a b c Sa a eden Ee | g BAe. ae Totals 
i] i | i —_——— 
a 11 | 21! 30 35 40 45 50 40 30| 2 303 
3 b 20 | 4400 600 700 800 900 | 1000 | 9800 600 | 40 9860 
= c 30 600 | 9900 1050, 1200! 1350] 1500! 1200 900 | 60 17790 
| d 35 700 | 1050 13475 | 1400 | 1575 | 1750) 1400, 1050] 70 22505 | 
sg e 40 800 | 1200 1400 | 17600 | 1800 | 2000 1600 1200| 80 27720 
3 Ef 45 900 | 1350 1575 | 1800 22275 | 2250; 1800, 1350] 90 33435 
8 g 50 | 1000 | 1500 1750 | 2000 | 2250 | 27500 | 2000 1500 | 100 39650 
8 h 40 800 | 1200 1400| 1600, 1800| 2000 | 17600 1200| 80 27720 
ai ¢ 30 600 900 1050} 1200), 1350| 1500 1200 9900} 60 17790 
| j 2 40 60 70; 80; 90 100, ~— 80 60 | 44 626 | 
| Totals} 303 | 9860 | 17790 | 22505 | 27720 | 33435 | 39650 | 27720 | 17790 | 626 | 197399 
| ! ! Ul ! 
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discrete quantities, and we can only judge what he means by this term from seeing 
the cases to which he has applied it. We shall therefore deal first with the 
discussion of the cases to which Mr Yule has applied his methods, and then 
consider their effects as applied to (a) continuous, (6) discrete variates. 


Mr Yule opens his paper by observing* that if we classify objects into two 
classes only, for example “peas into yellow-seeded and green-seeded, or the 
members of any group of mankind into male and female, the resulting data are 
of the simplest possible form.” The data may be thus in the simplest possible 
form, but difficulties might occur even in such simple cases as those cited by 
Mr Yule. The classification even into yellow- and green-seeded peas is by no 
means so simple as Mr Yule suggests, and certain types of hermaphrodite in man 
might undoubtedly puzzle even Mr Yule’s powers of discrimination. To under- 
stand really what Mr Yule proposes to classify we must turn to the cases to which 
he has applied his method. They are as far as we can judge from an examination 
of his writings the following: (1) Good and Bad Temper, (2) Presence and Absence 
of the Artistic Faculty, (3) Stature in Man, (4) Tallness in Plants, (5) Mental 
Dulness, (6) Low Nutrition, (7) Defects in Development involving “size, form or 
proportioning of parts,” (8) Abnormal Nerve Signs, involving “abnormal actions, 
movemeats, and balances,” (9) Mental Derangement involving imbecility and 
idiocy, (10) “ Blindness,” (11) Deaf-mutism, (12) Recovery and Death in the case 
of Smallpox, (13) Vaccination or non-vaccination, (14) Male and Female, (15) Cross 
and Self-Fertilisation, (16) Eye-Colour, (17) Colour of Flower, (18) Prickliness of 
Fruit. Mr Yule has probably used or suggested the use of the coefficient of 
association in other cases+. Looking through the above cases we see it is in 
the rarest instances, possibly only in (14) and (15), that Mr Yule has confined 
himself to discrete variates. He has applied his coefficient of association over 
and over again to continuously varying quantities. Temper (1),ar**-tic faculty (2), 
stature (3), tallness (4) are all quantitatively determinable vanates, even if 
difficult in some cases to measure. One man has a better or worse temper than 
another; one man has a greater or less degree of artistic faculty; where the 
division between presence and absence of the faculty is put, or what is called good 
or bad temper is largely matter of personal equation. But nobody doubts the 
range of such variates; they are continuous, there is no sudden break. 


Now turn to the next four characters (5)—(8). No one who has studied the 
essential difficulty of defining what is feeble-minded will doubt the continuity of 
mental dulness. It is not a discrete character, but a continuous variate. There 
are certainly all grades of mental defect, and the groups idiot, imbecile, feeble- 
minded, “simple” are quite artificial. If the whole population were graded 
according to intelligence, the frequency curve would be continuous; no one knows 
whether it would be Gaussian or not, or whether it would be “humpy” towards 


* Journal R, S. S., Vol. uxxv. p, 579. 
+ He has apparently approved of its application, as we shall see later, to a long series of absolutely 
continuous variates by Professor Niceforo. 
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the tails. If there is a gap between the defective and the normal members of the 
same sibship, intermediates in plenty will be found elsewhere. The whole difficulty 
associated with the Government Bill for the care of the feeble-minded turns on 
the questions how and where to draw the line between the “normal” and “ feeble- 
minded.” Whatever may be finally done, it is quite certain that there will be 
no real distinction between two individuals who fall just one side and just the 
other of the dividing line. Even the personal equation of trained observers will vary 
in classification, and Mr Yule takes the untrained record of thousands of the laymen 
who make census returns as marking off in some manner a “ discrete” character of 
“mental defect.” Absolutely the same remark applies to nutrition. The boundary 
of “low nutrition” is also an arbitrary dichotomy in a continuous variate fixed 
by the personal equation of the observer. (7) tells its own tale, for it is based on 
continuously changing and measurable characters. (8) is less obvious, but not 
only in number, but in quantitative intensity “nerve signs” are really continuous. 
To these also we may add mental derangement (9); there is a very great range 
of variation in imbecility and idiocy as we have already indicated under mental 
dulness. If we turn to “ blindness” (10) the source of it may be most varied, but 
if we define it merely as the loss of the faculty of sight, there is, as the simple cases 
of either congenital or senile cataract might have shown Mr Yule, almost every 
intensity of the failure of sight. Even certain cases of albinism are to be found 
in the Blind Asylums and are educated as semi-blind. Semi-blindness is so well 
recognised that special schools have been started for the semi-blind by certain 
educational authorities. In Bristol out of 22 children sent to the Blind Asylum 
at the expense of the Education Committee 13 had some degree of vision, and 
7 could read large print with varying degrees of difficulty, aud were able to do 
some form of handwork by sight. Out of 75 children specially examined for eye 
defects at Bristol in 1911, beyond those requiring glasses were 9 suitable for the 
partially blind <lass*. Of (11) deaf-mutism, we are less competent to speak, but 
we have been informed by the very best authorities it is far from an absolutely 
fixed condition and that it has a great variety of grades. The grades are more 
marked in the acquired than the congenital cases, but in the latter cases they 
vary with the cause of the congenital deafness. Cases even exist with slight 
degrees of deafness which would probably have been associated with mutism had 
the deafness been more considerable. Different degrees of hearing are found among 
deaf-mutes. There are scores among them who undoubtedly possess an amount of 
vowel hearing, and it helps the tone of their voices when they are being taught 
articulation; this is true, although they cannot distinguish speech without watching 
the lips and without their ears being within two or three feet of their interlocutor. 


* We have had before us the diary of a man who ‘“‘ went blind” in old age. At 70 the writing 
is perfectly clear and legible. It closes at 80 with the words scrawled in an almost unreadable 
trembling hand across the page. “I am now almost blind and with great difficulty I write this. 
Oh! the misery I feel, no one comprehends it.—-”? When was this man “blind” for a census return? 
The category ‘“‘ blind” covers a vast range of eases of partial sight, and it would be hard to draw a 
dividing line between ‘‘ blind” and ‘‘ normal” in such cases, 
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The number unable to articulate is negligible; all the deaf can make sounds and 
probably 90°/, could articulate with more or less success. Those who acquire 
deafness later in life retain their speech, but with impaired quality. The census 
returns club together the congenital and acquired forms. Where then is the 
“discrete” attribute? As for the blind so for the instruction of the semi-deaf, 
special schools have been established by some gducational authorities. There were 
19 scholars reported on in the Bristol Education Committee’s Report for 1911 as 
attending the semi-deaf school at Broad Weir. From “very deaf” to “very slight 
deafness” we have every graduation of hearing from those that can only hear 
under 1 yard to those who can hear at 6 gyards. There is every variation in 
speech from the “unintelligible,” “now begfnning to use a little,” “voice very 
weak,” up to “lisping nearly overcome” ang “speech of good quality.” Speech- 
reading forms an essential feature of the instruction, and the cases are those 
of transitional deaf-mutism. 

We entirely disagree with Mr Yule’s statements that such attributes as blind 
and seeing, deaf and not deaf, mentally deranged and not deranged, “if not 
absolutely discrete, are very largely distinct from each other” (Joc. cit. p. 638), and 
we wholly fail to follow his argument on this point. 


Mr Yule has apparently seized on recovery and death in cases of small-pox as 
discrete instances, but by Dr Macdonell and one of the present writers they were 
used to measure a continuous quantity—the severity of the attack. The data due 
to Dr Brownlee, and published by one of us in Biometrika, Vol. vu. p. 256, show 
that when the severity of the attack is classed by such categories as haemorrhagic, 
confluent, abundant, sparse, very sparse, that variate is essentially continuous, and 
that the mortality is largely confined to the two highest classes. Again when the 
immunity conferred by vaccination is reclassed under “ unvaccinated,” vaccinated 
over 45 years ago, 25—45 years back, 10—25 and 0—10, we at once recognise 
that vaccination regarded as conferring immunity is an essentially continuous 
variate. The same notion of continuity comes in, if we classify severity by the 
number of pustules on the face, comparison being made with a standard series of 
photographs of typical cases, or again show that area of vaccination scar affects 
the extent of the immunity. Mr Yule can hardly be ignorant of all this work, 
yet he lightly chooses the vaccination data as good illustrations of his method*. 
The classifications of intensity of attack by number of pustules and of vaccination 
by period since vaccination show in general frequency surfaces of a rough “cocked 
hat” type, and dispel at any rate the notion that immunity and severity can be 
treated as discrete variables. It is only the confirmed Mendelian who would 
classify any pigmentation character, whether of eye or of coat or of flower, into 
two alternative groups. The intensity of pigment is undoubtedly a continuous 

* That he was, as long ago as 1899, acquainted with the varying intensity of vaccination is con- 
clusively proved by the following sentence from his memoir on ‘ Association’ (Phil. Trans. Vol. 194 A, 
p. 289): “The association between non-vaccination and attack is very high indeed for young children 
—8 to ‘9—but drops sharply to °5 (owing presumably to the waning protection of the vaccination made 
in infancy) in the older age groups.”’ The italics are ours. 
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variate, when we consider the actual frequency of pigment granules, and it is 
only confusing the issue when eyes are divided into two classes, those with both 
posterior and anterior pigment, and those with only posterior pigment. We shall 
return to the question of eye-colour later, as it is a case out of which Mr Yule 
makes much capital, but which in his last paper is the only one he treats by his own 
association methods. If we take “ prickliness” in fruit, there is no evidence yet 
that it has ever been properly measured, and tbat it would not prove to be a 
continuous variate, much as “hairiness” proved to be in the case of Lychnis when 
actually measured by Weldon*. To sum up, there are not among the attributes 
used by Mr Yule to illustrate association coefficients any but those of sex and 
nature of fertilisation which can reasonably be considered discrete quantities, and 
in using even these he always couples them with characters which in our opinion 
are distinctly continuous variates. 


We shall therefore start this criticism of Mr Yule’s statistical investigations by 
indicating the fallacious nature of his coefficients of association and colligation as 
applied to continuous variates. We shall then deal with the question of discrete 
alternative variables, and show their absurdities in that case. Finally we shall 
show reason for questioning the details of the bulk of his memoir, which is not 
occupied with the discussion of his special coefficients at all, but in advocating a 
new empirical method which there is ample reason for considering equally fallacious. 


(7) On the Idleness of Mr Yule’s Coefficient of Association when applied 
to Continuous Variates. 


(a) The Need for either Knowledge or Hypothesis as to the Nature of the 
Frequency. 

Let us start from a very general case of two continuous variates. The first 
question that we require to answer is whether for a given value of one variate 
the mean value of the other variate changes. If we can find the mean values 
of the arrays of the second variate for given values of the first we obtain the 
regression line; should this be a straight line the correlation coefficient r is a suit- 
able measure of the relationship, if it be not then the correlation ratio » gives us 
a measure, and »?—7? marks the deviation of the regression from linearity. In 
cases where the regression lines are both straight and r=0, it by no means 
follows that the two variates are absolutely independent. The next essential: 
condition is the equal variability of the arrays, or what is nearly the same thing 
the probability of a combination of one variate « between # and 2+ da with the 
other variate y lying between y and y+ dy is the product of the probabilities 

* Biometrika, Vol. u. pp. 47—55. The danger of this sort of classification has come home very 
emphatically to one of the present writers, who had tried to use the category of ‘‘short-muzzle” 
as against ‘‘long-muzzle” in breeding dogs. He was convinced that ‘‘ short-muzzle”’ was a dominant 
character in the Mendelian sense, until he took actually to measuring the muzzles of the hybrids of 


first and later generations, when the idleness of treating such categories as discrete quantities became at 
once obvious. 
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of « lying between « and «+ 6x and of y lying between y and y+ dy. This is the 
true measure of independence of the two variates. It is always possible to test 
the independence of two variates—i.e. the probability of their independence— 
by aid of the mean square contingency and the use of Palin Elderton’s 
Tables*. But there is a whole class of continuous variates for which the 
regressions are linear and the correlation coefficient is zero, but which are yet 
heteroscedastic, i.e. the arrays of the y-variate for a given w-variate are not similar 
frequency distributions. Any frequency surface, with two planes of symmetry, one 
perpendicular to the axis of measurement of each variate, is representative of this 
class. Let DEFG be the oval contour line within which, for a given population JN, 
all the frequency lies. This may be the actual curve in which the frequency surface 
cuts the plane of xy, or if the surface asymptotes to that plane, the contour within 
which all individuals of the N observed actually lie. If the distribution were 
Gaussian, this contour would be ar: ellipse. Generally let us suppose it any non- 
re-entering oval curve. 
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Let the frequencies in the four classes made by divisions parallel to the axes 
a|b 
o|d’ 
AB | Ag 
aB| ap" 

Then Mr Yule’s coefficient of association Q is (ad — bc)/(ad + be); it is 
unnecessary at this point to consider Mr Yule’s coefficient of colligation a, 
which is only a special function of Q. 


of the variates, ie. to DOF and HOG, be represented by which we have 


used throughout in preference to Mr Yule’s more cumbrous 





All along HOG taken as one dividing line (the other will be perpendicular 
to it), @=0; all along DOF also, Q is zero. All along the ares EF and DG, 
Q=+1; all along the arcs FG and DE, Q=-—1. These values are indicated by 
numbers in the diagram. If the frequency surface be non#e-entering, then when 
the axis of the rectangular dividing planes is taken anywhere in the quadrants 
EOF and DOG, Mr Yule’s Q is positive and varies from 0 to 1; if this axis be 


* Pearson, Drapers’ Research Memoirs, ‘On the Theory of Contingency,” p, 6 (Dulau and Co.), and 
Biometrika, Vol. 1. p. 155. 

+ If the curve be a re-entering one, the rapid value changes of Mr Yule’s coefficients are still more 
remarkable! 
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taken in the quadrants HOD and FOG, Q is negative and varies from 0 to —1. 
Hence for two continuous variates, whose actual correlation is zero but which 
are not homoscedastic, we find Mr Yule’s coefficient will run through the whole 
range of possible values from —1 to +1, and what its observed value will be 
depends solely on where we take our dividing planes. 


It is perfectly true that Mr Yule’s Q vanishes if the variables are theoretically 
independent, but no variables are practically independent, and in actual statistics 
of two continuous variates when the grouping is fairly small we do not get DE FG 
a rectangle, the sole bounding contour for which Mr Yule’s coefficient of association 
is zero all round. The fact that Mr Yule gets +1 or —1 for his coefficient round 
his boundary contour would be of small importance were it not that he appears 
to hold that, when Q= +1, then its probable error is zero. Round the bounding 
contour of a distribution of this kind Pearson’s normal coefficient has usually a big 
probable error, and the investigator is thus warned that its vagaries are of no 
account*. When the investigator comes, however, to a fourfold classification leading 
to @=+1 by the zero of one of its classes, he would, if he were to follow Mr Yule, 
assume his result absolutely reliable, and to be so independently of the total 
population used. As a matter of fact, his dividing lines may have given Q=+1 
solely because they chanced to be taken near the bounding contour of a frequency 
distribution, of which the investigator knows nothing! The real fact of the case 
is that Mr Yule’s investigation of the probable error of Q, while correct as long 
as the frequency of any of his four classes is substantial, fails entirely when one 
of his four classes is zero, and is correspondingly in error when Q is very large 
owing to one of the four classes being very small. Even if Mr Yule determines 
the probable error of Q for such cases by higher approximations, it will be 
meaningless without consideration of the frequency distribution of Q, which is 
an exceedingly skew curve for high numerical values of Q. 


Before we leave these cases of zero correlation it is worth while to indicate 
how Q works for various artificial frequency surfaces. 


(1) A rectangular block. @Q is zero all round the boundary and for all 
possible divisions. 














(2) A square prism; diagonal planes parallel to the variate axes. Q is positive 
in two quadrants and negative in two, and takes all values between +1 and —1 
according to the point of division. It is essential to note that Q does not, as 





* The matter is discussed more at length in Appendix I. 
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might be thought, give small values except in immediate proximity to the 
boundaries, it rises to quite substantial values when the percentage in any quad- 
rant takes a value which Mr Yule has not hesitated to use when criticising 
other methods of approaching the problem of association, and—according to 
Mr Yule—as it approaches these values the reliability of his coefficient increases ! 


Driacram II. Frequency surface of zero correlation exhibiting every possible variation of 
Q with different dichotomic lines. 
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(3) Let us now take a cylinder on circular base. Here again the correlation 
coefficient is zero, but Mr Yule’s coefficient runs through its whole range from — 1 
to +1, being negative in two quadrants and positive in the other pair. 


ater Se) 


The above cases illustrate, we think effectively, the point that the coefficient of 
association tells us absolutely nothing, unless we make some assumption as to the 








it 


ey ea 





Baretta A 5 








Kart PEARSON AND Davip HERon 197 


nature of the surface of frequency. For surfaces of zero correlation it must always 
take all values from +1 to —1 according to the position of the dividing planes. 


We will take one more of these zero correlation tables because it leads up to 
certain new points. Such a table as that given as Table VIII might well occur in 
practice; worked out by the product-moment method, the regression lines are 
linear and there is zero correlation. Table IX shows how extraordinarily the 


Discram III. Frequency surface of zero correlation exhibiting every possible variation 
of Q with different dichotomic lines. 

















coefficient of association varies from point to point of division. It is not only zero 
along the axes, but zero along a contour line in each quadrant. Thus, starting from 
the centre of the surface, we descend in the first and third quadrants to a negative 
association —°45, then crossing the zero contour, we rise to a positive value of 
+ ‘50 and ultimately reach +°87! In the second and fourth quadrants the process 
is reversed; we rise first to + ‘45, then sink to zero and descend first to — ‘50 and 
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then to —°87. What information can the coefficient of association give us as to 
the nature of these two uncorrelated variates? It is no doubt a measure in some 
manner of the heteroscedasticity of the arrays. But how and in what way does 
it measure this phase of want of independence by a value which varies from —1 
to +1 several times over? If it in some manner measures this heteroscedasticity, 
it is only by its local values, it measures nothing of the dependence of the variates 
asa whole. Will Mr Yule tell us how to infer whether, when Q= +87 or —°87, 
it is a measure of the relation between the means of the arrays corresponding 
to given variates, or is merely a measure of the differences in the variabilities of 
those arrays? Will he also tell us in what manner, by a multiplicity of values, 
it measures mere heteroscedasticity ? 

Are we doing Mr Yule an injustice in taking any notice of Q at the extreme 
limits of our table, e.g. of such values as +°81 or —‘81? Well, consider the 
corresponding fourfold table : 


3 | 707 | 710 
| 





77 | 174,295 | 174,372 Q=+811 


80 175,002 175,082 





and multiply each entry by the factor 





32,527,843 
175,089 = 18578683, 
we find 
557 | 181,351 131,908 





14,306 | 32,381,629 | 32,395,935 
14,863 | 32,512,980 | 32,527,843 


Compare this with the fourfold table deduced by Mr Yule from the Census 
data of 1901: 

















Blindness. 
. | ei a Pee ii ea Bie a Pe SY ee 
6 Present | Absent Totals 
& | | 
o 
Q | Present... 558 132,096 132,654 | eer 
= | Absent ... | 24,759 | 32,370,430 32,395,189 | Y= + °693, 
_ 
8 | Totals 25,317 | 32,502,526 32,527,843 
» | Be S eer ees 


or the following one from the same Census, again formed by Mr Yule: 











~ | Present | Absent Totals 

oi 

a 

@ | Present... 96 | 25,221 25317 | oo 4-789 
= Absent a 15,150 32,487,376 32,502,526 

a | 
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Totals 15,246 | 32,512,597 32,527,843 
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and it will be obvious that we are doing Mr Yule no injustice at all. Our 3 in 
175,082 is 557 in 32,527,843 and is far higher than Mr Yule’s 96 in 32,527,843! 
Why his 96 in over 32 millions might almost consist of the persons who had 
been rendered at the same instant both deaf and blind by accident! Yet on the 
basis of this result Mr Yule has asserted a “very high association” between deaf- 
mutism and blindness! 


But, since Mr Yule has no hypothesis as to the nature of his frequency, why 
should not the relation between blindness and deaf-mutism be precisely like that 
of the nature of the variates exhibited in Table VIII? If this be so, what has 
Mr Yule’s coefficient of association told us? The variates would be actually 
uncorrelated, but we should anticipate : 


(i) That extremely bad sight would be associated with extreme deafness— 
this is the above Q=+°78. 


(ii) Extremely bad sight would be associated with great aural acuity. This 
has often been asserted of the blind. Here Q would be high and negative. 


(iii) Extremely good sight would be associated with extremely good hearing, 
i.e. Q would be high and positive. Persons with exceptionally good capacity of one 
sense very frequently have it of another sense. 


(iv) Extremely bad hearing would be associated with exceptionally good 
vision, i.e. Q would be high and negative again. This is quite possible, although 
we have no conclusive evidence on the point. In a small school for deaf-mute 
girls 90°/, of the children were found to have normal vision (§ or $), none 
had worse than ;4. In a group of children of normal hearing with light hair 
only 69°/, had normal vision and 11°5°/, with vision of ; or worse. London 
girls give 85°/,, Glasgow 82°/, and Edinburgh 80°/, normal vision—all lower 
values than in the case of our small sample of deaf mutes. 


Results (i)—(iv) would hold if there were no correlation between goodness 
of sight and hearing—the average sight of a very deaf person being the same 
as one of normal hearing—provided the variability in sight of the very deaf were 
less than that of the general population, and the variability in hearing of the very 
blind were also less than that of the general population *. 


Thus given a fourfold table which is based upon continuous variation, if we 
make no hypothesis with regard to the nature of the frequency, we have in fact 
no idea at ail of what Mr Yule’s coefficients of association and colligation really 
measure. They measure in some form or another deviation from independence, 
it may be true correlation or it may be heteroscedasticity, and divisions taken at 
very slight distances apart may give hopelessly divergent values ot Q, of which 
difference of values Mr Yule has given no intelligible interpretation. 


* In the case of our Table VIII, the variability of the horizontal character for the whole population 
=1-6686, and for the vertical character 1°5979. The variability of the combined two top arrays is 
1-6444, and for the extreme vertical column on the right 1°8166. Had the variabilities of the two sets 
of arrays been the samz as those of the general population, the association would have vanished. 
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If we turn from surfaces of zero correlation to those of finite correlation, we 
find in the same way that Q takes innumerable values which have no mutual 
relationship. Heron has already demonstrated this as far as the Gaussian surface 
of frequency is concerned*. For example, in a Gaussian distribution Mr Yule’s 
association can take every value from ‘37 to 1:0 if the correlation be truly 3 
and the divisions be taken along the diagonal. These give for a Gaussian 
distribution the complete range of Q, but it by no means follows that this is 
true for other types of frequency. Mr Yule, however, makes no hypothesis as to 
the nature of his frequency surface. To test the kind of meaning Q conveys, 
suppose the frequency surface to be a rectangular block, length 2a, breadth 2b, 
height h—that is to say, within a given rectangular area the frequency of all 
combinations of the two variates is equally probable. If the block slopes with its 
side 2a at an angle of @ to the x-variate, we have the correlation 


oi ac (a? — b*) cos @ sin 0 
Va? cos? @ + b? sin? 6 Va? sin? 6 + b? cos? 0 








The regression lines are built up of straight lines; for a considerable distance 
they coincide with the axes of symmetry, but are afterwards bent round horizontally 
or vertically as the case may be. At the four corners Q=0 and along one pair 
of parallel sides Q=+1, along the other Q=—1. Thus two contour lines of 
zero association pass through the corners pair and pair. In the accompanying 
diagram (p. 203) the values of Q are given for the special case where 0=45°, a=1°5), 
and accordingly 7 =*3846. Going along the longer axis Q changes from —1°0 to 
— ‘295 and so to zero, then it rises to +°6 at the centre and falls to zero again, 
becoming negative and ultimately concluding with —1°0 at the boundary. Along 
the shorter axis Q varies from + 1:0 to +6 at the centre and rises to +1°0 again 
below it. What can be learnt as to the real association of two variates by a 
coefficient which behaves in this way? The case would be quite different if 
Mr Yule had indicated a type of surface for which Q was constant for all 
divisions and demonstrated that it represented, even with moderate approxi- 
mation, such distributions as occur in statistical practice. One such surface of 
“stable association” at any rate is known for the tetrachoric 7 treated as a 
coefficient of association merely, and that surface is not widely divergent from a 
considerable number that we actually meet with. 


(b) The Fallacy of Mr Yule’s Selection in the case of Continuous Variates. 


If Q be not even approximately comparable with itself when taken on the 
same surface with different dichotomies, how can it be comparable as a measure 
of any real relationship from one surface to a second? Mr Yule will no doubt 
reply that a function of @ does measure certain percentages when the table 
is dressed in an equalised symmetrical form. Our reply is that that form has 
been obtained by a method of selection which makes very large changes in every 
other coefficient, including the Boas-Yulean, which has been used to measure 


* Biometrika, Vol. vit. p. 109. 
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relationship ; that we dispute entirely the legitimacy of such selection which is of 
a singularly arbitrary character, both in the extent to which it is applied and the 
region to which it is circumscribed. If we apply it to continuous frequency 
surfaces, so that a certain Q remains constant, all other Q’s are changed, and r 
taken as a measure of the relationship of the variates as a whole is often 


Diacram IV. Q for frequency surface a right six-face. To illustrate how variation of Q depends 
on form of distribution and how it has no relation whatever to true correlation. Actual corre- 
lation *3846. 
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immensely changed. A correlation table has a multiplicity of Q's and one product- 
moment r; a process of selection, which changes all Q’s but one, and widely 
modifies r, has attributed to it by Mr Yule some special merit by which that Q, 
in preference to any other, is considered for the time being to measure the 
“association” of the variates ! 

26—2. 
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Any argument, as we have already indicated, which is valid when applied to 
the columns and rows of a fourfold table ought to be valid when applied to the 
columns and rows of a multifold table. Such a table should also not be affected 
by selection. Well, let us take the Table from the Census of the age of husband 
and wife and let us select so as not to change certain Q’s and see what effect 
this has on the correlation. The following tables give Q and r before and after 
a series of selections. 


Let the divisions be at the same ages for both husband and wife, say under 30 
and over 30. 


Coefficient of Association ='9745 
Coefficient of Colligation =*7958 
Correlation before Selection =‘9136* 
Percentage of those over 30 years selected : 
Actual Correlation Actual Correlation 

100 °/, ‘9136 5°/, 6969 
10°/, 8347 4°/, 6479 
9°/, ‘8175 s*/. 5921 
8°/, 7961 | 2°), ‘5360 
rT 7698 | be 4952 
6°/, 7369 0°), 4850 


In other words the selection which reduces the actual correlation from ‘914 
to ‘485 leaves Mr Yule’s coefficients of association and colligation unchanged ! 
No, this is not true; every other coefficient of association and colligation for 
the table is changed, except the particular two for the arbitrary division at 
30 years! What legitimate inference of any kind can be drawn from the constancy 
of this individual pair ? 


Now let us select husbands and wives unequally but still at age 30 divisions: 


Husbands Wives Actual Correlation 

100 °/, 100 °/, 914 
10°/, over 30 10°/, under 30 “908 
1°/, over 30 10 °/, under 30 850 
10°/, under 30 1°/, over 30 715 

1 °/, under 30 0'1°/, over 30 285 
000 °/, under 30 000 °/, over 30 — ‘009 
000 °/, over 30 ‘000 °/, under 30 — 038 


During all these operations which reduce the actual relationship as measured 
by correlation from the very high value ‘914 to zero and even to negative values, 
Mr Yule’s association and colligation for his selected dichotomies show the constant 
“very high values” ‘975 and ‘796. At other dichotomies of course they cover 
pretty well the whole possible scale. 


* Value without Sheppard’s corrections, because in dealing with selection it is not clear that those 
corrections are always appropriate. 
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Nor has the result anything to do with the division at ages 30. If we divide 
at 21 years we find: 


Percentage 
selected over 21 Actual Correlation 
100 °/, 914 
10°/ ‘939 
ie 2 “941 
pe Hin 58 “493 
000 °/, 272 


Here the Q and » of Mr Yule retain throughout the operations the values 
‘987 and ‘853, which mark, we have been told, “very high association” ! 


Nor is the absurdity in the least confined to ages of husband and wife. Let 
us take stature in Father and Son* and divide into a fourfold at Fathers 67°5” 
and over, and Sons at 68°5” and over. The actual correlation is +520, Q=°683 
and w='395. 


Selection of Father Selection of Son Actual Correlation 
100 °/, 100 °/, 520 

10°/, over 67°5” 10 °/, over 68°5” 314 

10°/, under 6775” 10°/, over 68°75” 275 
000 °/, over 67°5” 000 °/, over 68°5” 251 

1°/, under 67°5” 1°/, over 68°5” 185 


The result is exactly the same as before, the real relationship is immensely 
modified by selection, while the colligation and association remain unchanged for 
one pair of arbitrary divisions and for this one only. What can be learnt from 
such a statistical method? We venture to believe that from the standpoint of 
common sense it is wholly without meaning. 


What is the precise physical character which is to be attached to this wide 
difference between “association” and correlation? That correlation is affected 
by selection we know only too well; it is one of the factors of progressive 
evolution under natural selection, but what profitable meaning of any kind 
is to be attached to the statement that one out of an indefinite number of 
associations has remained unchanged by a special selection? Does not the 
principle that “association” or “colligation” is unchanged by selection arise 
from the fact that Mr Yule has merely guessed a denominator to his coef- 
ficient, which denominator has no theoretical justification of any kind; and hjs 
principle that selection makes no change is a later discovery and has no validity 
at all, for it is not a “natural” selection and has no generality beyond the fourfold 
table ? 


We think we have sufficiently indicated that Mr Yule’s coefficients of associa- 
tion and colligation fail entirely for any variates which may be suspected in any 
way of continuity, and the bulk of the variates to which Mr Yule has applied his 
methods undoubtedly have such continuity. 


* Biometrika, Vol. u. p. 415. 
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Mr Yule would no doubt tell us that he has distinctly stated that he distin- 
guishes between correlation and association and that he knows they may lead 
to diverse results. We reply that, wherever there is any real continuity, the 
assumption of a discrete “attribute” disguises its existence and will lead to 
erroneous conclusions. Further he directly states* that: “The methods applicable 
to the former kind of observations, which may be termed STATISTICS OF ATTRI- 
BUTES, are also applicable to the latter STATISTICS OF VARIABLES. A record 
of statures of men for example may be treated by simply counting all measure- 
ments as tall that exceed a certain limit, neglecting the magnitude of excess or 
defect, and stating the numbers of tall and short (or more strictly not-tall) on 
the basis of this classification. Similarly, the methods that are specially adapted 
to the treatment of statistics of variables, making use of each value recorded, are 
available to a greater extent than might at first sight seem possible for dealing 
with statistics of attributes. For example, we may treat the presence or absence 
of the attribute as corresponding to the changes of a variable which can only 
possess two values, say 0 and 1.” 


Here Mr Yule directly claims that his methods can be applied to stature, and 
in the next sentence suggests that it is reasonable to treat the difference between 
any tall man and any short man as unity because they have been placed under two 
class-indices “ tall” and “short”! He started his statistical work from the stand- 
point of the pure logician, and he does not perceive that he is applying his 
reasoning to the class-names of things and not to the things themselves behind 
these names. Let us take head length and head breadth with a correlation, 
say, of ‘50, and lengths of femur and humerus with a correlation, say, of -60, then 
it is perfectly easy by selecting your head length and breadth division and your 
femur length and humerus length divisions, to make the association between head 
length and breadth either greater or less than that between femur length and 
humerus length. What is the value of the coefficient of association as a measure 
of relationship if this be the case? Every new division gives a different ratio 
of association for the two sets of attributes. The application of such methods 
in practice can only tend to the detriment of modern statistical theory. 


(8) On the Application of Mr Yule’s Coefficients to Discrete 
Variates. Mendelism. 


May we not, however, accept Mr Yule’s claims for his coefficients of association 
when the classes differ by a real discrete unit—not a unit arbitrarily introduced 
by calling a “short” man 0 and a “tall” man 1? Well, the difficulty is to find 
such cases. However, supposing them to exist, then there is no question that the 
coefficients of association and colligation are not the right methods of approaching 
the problem, but that the ordinary product-moment correlation coefficient is the 


* Theory of Statistics, pp. 7—8. 
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correct method; but this will lead us to results absolutely opposed to those of 
Mr Yule’s association and colligation. 


We ourselves, however, doubt the existence of this discrete unit; we have 
only come across it in theoretical Mendelian investigations and doubt whether 
the “unit character” whick is absent or present has any existence in somatic 
classifications. Of this we should like to give two or three illustrations. 


The first illustration we take is from a paper* by Professor E. M. East, 
entitled “The Mendelian Notation as a Description of Physiological Facts.” 
Professor East is a vigorous Mendelian making the very best defence he can 
of the Mendelian notation in the present parlous condition of Mendelian theory 
which assumes the truth of the unity of the unit. The cross which Professor 
East cites is one between a “long” corolla and a “short” corolla race of Nicotiana. 
Speaking of other Mendelian investigations into size, Professor East writes of 
them: “No criticism could be made except that certain of the characters used 
varied considerably in the mother varieties and therefore were presumably not 
homozygous for all character factorst. ‘This criticism is apparently answered by 
a recent investigation of the writer’s, as yet unpublished, where two species, 
Nicotiana forgetiana and Nicotiena alata grandiflora, were crossed. As seen by 
the table, the corolla length is very slightly variable in either species, nor is it 
affected to any extent by environment, yet each species was absolutely reproduced 
by recombination in the F, generation.” 


2§ 
TABLE X. 


Frequency Distributions for Length of Corolla in a Cross between Nicotiana 
forgetiana (314) and N. alata grandiflora (321). 





Class Centres in Millimetres 
Designation |_ ___| Totals | 


— 
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we 
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45 | 50 | 55 | 60 | 65 | 70 | 75 | 80 | 85 | 90 
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314 9 |133| 28 —| ay Peet Fe Gree 


| 
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' 


-{|—|—]|—|—]-—|—|]| 170 

321 he acts te : aah i = 4 | 1] 19] 50] 56; 32) 9 167 

(314x331)7, | — | — |- 3 | 30] 58] 20) — —|—| }—j}—j] 1ll 

(314x321) Fy | — | 5) 27 | 79 | 136 | 125 es 102 105| 64 | 30| 15] 6 | 2/—| 828 
| | | 


Now let us call short all “below 60 mm.” and long all “above 60 mm.” We 
have all the offspring “below 60 mm.” Hence there is “dominance” of short 
corolla, and we may apply the magic formulae : 

(DD) x (RR) = 4(DR), 
(DR) x (DR) =(DD) + 2(DR) +(RR), 
* The American Naturalist, Vol. xuv1. p. 639. 
+ Note the writer’s interpretation of the results by the preconceived theory ! 
~ The author does not tell us how many plants were grown in each generation. In the parent 


generation presumably only two. How many were considered in F,; and F)? The difference of 
variability would largely turn on this factor. 
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with the result that the segregating generation F, shows 222 (RR)’s, a very 
plausible Mendelian quarter. 


Prof. East does not do this, although it was a step absolutely compatible 
with current impressionist classifications by Mendelians. All he says is that 
“each species was absolutely reproduced by recombination in the F, genera- 
tion” and “I do maintain that the Mendelian notation satisfies the facts of 
size inheritance as well as it satisfies the facts of qualitative inheritance” 
(Lc. p. 639). 

Well, the fourfold Mendelian table of somatic characters would give in the 
F, generation (RR) = 222, (DD) +(DR)=606*. Now will anything be discovered 
by assuming that those two groups differ by a unit? This is the “dichotomy” of 
Mr Yule’s association. We contend that, while theoretically (RR) differs from 
(DR) by having no D as against one D and from (DD) by the latter’s having 
two D’s, this theory is idle when pushed into actual Mendelian statisticst. The 
division at 60 is not a dichotomy of things differing by a unit, except in name, 
but an arbitrary cut across a continuous distribution, and the application of either 
the Boas-Yulean ¢ or such coefficients as those of association and colligation is 
entirely misleading. We are told that “short” is “dominant” over “tall” as 
a result of experiment, and the importance of dividing at 60 mm. to get our 
tables is dropped out of sight. 


Piebaldism is another unit character of the Mendelians, and doubtless 
Mr Yule would be content to take his dichotomy between piebald and whole 
colour. Now here are 2314 mice classified according to the ratio of pigmented 
area of coat to whole area of coat?. Where shall we make the division between 


| 
Total | -10, *15 | -20, :25 | °80, ‘35 | 40, *45 | °50, 55 | 60, °65 | -70, °75 | °80, *85 | 90, °95 | -975 | 1°00 


| 9314 | 6 31 60 81 126 111 138 176 79 393 





the piebald and the whole colour? The group ‘975 largely refers to individuals 
with a very small white area on belly; but as a matter of fact the 1:00 group 
has been divided into three sub-groups of individuals who, without having white 
areas on belly, have or have not somewhat lighter pigmented areas there. Where 
is the true dichotomy, especially when we can show that each grade of pie- 
baldism is hereditary? Mr Yule would no doubt apply his association to 
piebalds and whole colours as giving a Mendelian unit, but in doing so he 
will be applying his dichotomy to words, to class-indices and not to the real 
things represented by them. 

* Whether this means anything or not would of course depend on the 105 at 60 never in later 
generations giving anything below 60! 

+ Take the division at 35 mm. and we have dominance of long corolla, but the Mendelian quarter 
now fails. 
+ From the late W. F. R. Weldon’s mice data now at press. 
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distribution. 








Again, an illustration from Mendelian dichotomy may be found in a paper 
by Hurst entitled “Mendel’s ‘Law’ applied to Orchid Hybrids*.” He desired 
to give a proof that the F, generation consists of 50°/, of (DR)’s and 50°/, 
of “specifics,” (DD)'s and (RR). He recognised that the first cross gave an 
“intermediate,” so he defined his (DD) as all those, which show 2(DD) character 
and more, his (RR) as all those that show (RR) character and more, and the 
“intermediates” or apparently the (DR)’s all those that show character between 
®(DD) and 3(RR). As a result his “ specifics” came out as 2281 and his 
“intermediates” as 2267 in number, a plausible Mendelian 1:1 ratio. Thus the 
classification into every one of the groups (DD), (DR) and (RR) in the F, 
generation is by trisection of a continuous variate at arbitrary valuest. 

Pearson has come across an exactly similar instance of the vagueness of the 
Mendelian unit in breeding dogs. If a short-muzzled dog be crossed with the 
long-muzzled dog, the hybrid would be described by general impression, and was 
so considered by him, as short-muzzled. The result was to indicate dominance 
of the short-muzzle. But when muzzle indices were formed and the dogs’ heads 
measured in a variety of ways, the hybrids were found to be intermediates, and, 
crossed in again with the short-muzzled stock, they gave a group the mean of 
which had a position intermediate between the hybrid and that original stock. 
Each generation had very considerable variation. Dichotomy giving Mendelian 
ratios was possible, provided an arbitrary division was taken across the con- 
tinuous distribution. Mr Yule’s unit difference, short-muzzle—not short-muzzle, 
would be a perfectly idle one across what in the F, generation is a continuous 


One of the most remarkable features, indeed, of the present situation is the 


a = ‘ é aid Paes 
& assumption that in some mysterious manner Mr Yule’s coefficients of association 


or colligation can be applied to Mendelian results. Mr Sanger, in his contribution 


to the discussion, said: 


Vol. LXXv. p. 646). 


* Journal of the Royal Horticultural Society, Vol. xxv. Part 4. 


Indian cottons, Journal Asiatic Soc. of Bengal, N.S. Vol. tv. p. 13, 1908. 


of a unit difference between the members of either class. 





Biometrika 1x 


always dealt with things which were thought to be continuous. 
was this difficulty that mathematicians had a prejudice in favour of more elegant 
mathematics, and the Mendelians had not yet learnt algebra; but that day would 
come, and then Mr Yule’s work would be the work for the Mendelians” (J. R.S.S. 


“One additional reason why he welcomed the Paper was that the rise of 
Mendelian biology had made a great difference. There they were always dealing 
with things which were discrete, whereas according to all Galtonian laws they 
At present there 


Mr Yule nowhere repudiated this application of his coefficients, and yet they 


are the last which can possibly be applied to Mendelian data! We put on one 


+ Martin Leake has found continuity in the F, generation with an intermediate mean in the case of 
Even more astonishing 
frequency distributions for the F, generation for ‘‘talls” and ‘‘ shorts,” number of nodes and lengths 
of internodes may be obtained from Mr R. H. Lock’s ‘‘ Studies in Plant Breeding in the Tropics,” 
Annals of Royal Botanic Gardens, Peradeniya, Vol. u. In these cases it is wholly impossible to speak 


27 
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side what Mr Sanger can possibly mean by “all Galtonian laws” dealing with 
things thought to be continuous—he has clearly never read the treatment of eye- 
colour in man nor of coat-colour in Bassett hounds by Sir Francis himself, who 
distinctly treated them as discrete quantities and applies “his laws” to them. But 
we must indicate the fallacy of applying coefficients of association and colligation 
to Mendelian characters. The reason for applying them is the assumption made 
that a Mendelian character is a discrete unit. But if this be so, fourfold, and 
three by three Mendelian tables should be treated as discrete tables and true 
product moments formed of them. We believe that one of us was absolutely the 
first to apply these methods, treating the Mendelian theoretical characters as 
units*, and his work has been followed up by a whole series of workers in the 
Biometric Laboratory+. There was thus no question with the Biometric School 
of how Mendelian theoretical problems should be dealt with, and Mr Yule wholly 
misses the point when he states that Dr Snow’s “ recent comments in Biometrika 
on the use of the normal coefficient for Mendelian tables in Dr Brownlee’s paper” 
were “a much stronger condemnation of Professor Pearson’s than Dr Brownlee’s 
work” (J. &. 8. S. Vol. Lxxv. p. 651). Pearson has never used a normal corre- 
lation coefficient on a true fourfold tablet, which he believed to be Mendelian in 
character. He has only applied such coefficients when he believed the character 
under consideration to be at bottom continuous, and as far as eye-colour is con- 
cerned, the many dissections of eyes he has been able to examine in his recent 
investigations as to albinism have confirmed rather than weakened that stand- 
point. But even had he done so, although the use of such a normal coefficient 
might be criticised on the ground of the labour involved in determining it, it cannot 
be condemned on any other ground, for it is in all respects as good a coefficient of 
association as Mr Yule’s Q or , and possesses the important property that it is 
subject to selection—in opposition to the wholly fictitious merit which Mr Yule 
claims for his coefficients, namely that they are uninfluenced by selection. 

This point is so well illustrated by Mendelian theoretical tables, that we stay 
to demonstrate it here. Let us consider the correlation ot father and offspring 
when a population represented symbolically by the fathers 

1(AA) +m (Aa) +n (aa) 
is crossed at random with a population of mothers given by 
l’\ AA) + m' (Aa) + n' (aa). 
Here we can put N=/1+m+n= total of fathers, l’+m'+n'=N’ = total 
mothers, and the fundamental Mendelian formulae 
(AA) x (aa) = 4(Aa), 
(AA) x (Aa) = 2(AA)+2(Aa), 
(Aa) x (Aa) =(AA) +2(Aa) + (aa) 
* Phil. Trans. Vol. 203 A, pp. 53—86, and R. S. Proc. Vol. 81 B, p. 225. 
+ Jacobs, R. S. Proc. Vol, 84 B, p. 23; Snow, R. S. Proc. Vol. 83 B, p. 37, and Biometrika, 
Vol. vi. p. 420. 


t+ The error of Dr Brownlee’s work was that he went back on all this and applied continuous 
methods to theoretical Mendelian tables; see Proc, Roy. Soc. Edin, Vol. xxx. p. 473. 
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are assumed to hold. If every possible father be mated with every possible 
mother so as to insure random mating, we have the contingency table for father 
and offspring : 














Father. 
[ as ee ge eee as, Geeta ‘Stirs 
(AA) (Aa) | (aa) Totals 
| | 

(AA) 21 (20 +m’) ;} ™ (20’ + m’) 0 (27+ m) (20' +m’) | 
° ! 
$|_} —____. @———_{_______ 

‘— ‘ “4 | co : 2 . : (2 U 5 ¢ ’ - 
by (Aa) 21 (in’+2n’) Qm (U' +m’ +n’) 2n (20 +m’) } pata te gaa | 
5 —3}-—__—_—-4- - — 

(aa) 0 m (m' + 2n’) 2n (m' + 2n’) (m+ 2n) (m’ +2n’) 














Totals} 4/7 (7’ +m’ +n’) 4m (U' +m’ +n’) 4n (U+m' +n’) 4 (l4+m+n) x (U'+m' +7’) 
| 





Now there are four points at which we can make a fourfold classification, a, 8, 
y and 8, and these will give tables for which the association and colligation in 
Mr Yule’s sense can be calculated. Now we will suppose 


N=l+m+n and N’=l'4+ m'4+n' 


to remain constant, so that the total population is the same. Then if we! divide 
at a no selection of (A.A) or “dominant” fathers will affect the coefficient of asso- 
ciation ; if we divide at 8 or 6, Mr Yule’s coefficients are unity, or there is perfect 
relationship between parents and offspring, and if we divide at y, no selection of 
recessives will affect the Yulean association. What light can Mr Yule’s coefficients 
possibly throw on Mendelian inheritance, when for two possible divisions they 
make the parental relationship perfect and for the other two they give substantial 
values of the relationship, but render it completely independent of selections, 
which in reality widely influence the relationship of parent and offspring, if we 
proceed by the theory of discrete units? If we accept—which the present writers 
do not—the theory of dominance and assert that (AA) and (Aq) are somati- 
cally identical, and represent one somatic character, then y is the only reasonable 
division to make, and the typical Mendelian table becomes: 

Father. 


——$—$——$—$ $y 


(AA), (Aa) (aa) Totals | 





Bo (AA), (Aa) | 4(l+m) N’ —m (m'+2n’) | 2n {2N'—(m'+2n')} | 4NN’ — (m+ 2n) (m’+2n’) 
TRE | ! iba iis 

eo } 

ro) | (aa) m (m' +2n’) | 2n (m' +2n’) (m+ 2n) (m’ + 2n’) 











Totals 4 (14+m) N’ | 4nN’ 4NN' 
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We have at once from this table Mr Yule’s coefficient of association 


ee N’ (21 + m) 
¥ N’ (21 + m) +m (2U' + m’)’ 

But the correlation 7, as found by the product-moment method for discrete unit 
characters, is :: 

Vn Vm’ + 2n’ (21 +m) 


r= — * 
V(m + 2n) (l + m) {}4.NN’ —(m + 2n) (m’ + 2n’)| 





Now examine Mr Yule’s coefficient: 


(i) There is perfect association, Q=1, when we take no heteromorphic 

fathers (m= 0). 
~ L(m' + 2n’) 
The value of r = fy eS 
; 21N' +n (2 + m’)’ 

and takes, as it should do, all sorts of values according to the nature of the 
mothers, and the proportions of true dominant and true recessive fathers in 
the whole paternity. Thus, for all fathers recessive, it is zero, and for all fathers 
true dominant (n= 0) with all mothers purely recessive (l’=m’=0) it is unity. 
Thus the true correlation under the influence of selection can take every possible 
value while Mr Yule’s coefficient gives perfect association throughout ! 


(ii) There is perfect association, @ = 1, when we take only recessive mothers, 
ie. l= m’=0. 


The value of r ie ei n (21 + m) 


and depends entirely on the distribution of fathers. When there are no hetero- 
morphic fathers (m=0), it is unity. When there are no recessive fathers (n=0), 
it is zero. That is to say, while Mr Yule’s coefficient shows perfect relationship 
throughout, the true correlation or real association can run through the whole 
range from zero to unity. 


It is not too much to say that those who suggest that Mr Yule’s coefficients 
of association and colligation will be of service in Mendelian problems cannot have 
had any acquaintance with the nature of those problems at all. Mendelian theory 
relates to discrete units and every coefficient which is uninfluenced by selection is 
on that very ground wholly unsuited for use with such units. Selection modifies 
correlation when we deal with discrete units just as much as when we deal with 
continuous characters, and any coefficient is valueless which directly starts with 
the property that it will not be modified by selection. 


Mendelian practice classifies under unit designations individuals which, as we 
have just indicated, often show no sharp line of division at all. In such cases to 
treat the difference of two classes as a unit is juggling with class names, not 
dealing with the things so classed. Mr Yule has failed in these matters because 
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he starts from the field of pure logic and not from the observation and record of 
actualities. Even if the actual Mendelian differences were units, not the differences 
of continuous variation, then @ would be the right coefficient to use, not those 
of colligation or association*. But even here the results will be often difficult to 
interpret. In the usual case, however, of Mendelian practice, what we need is 
not the value of a correlation, but an investigation of whether observation is a 
reasonable fit to theory, ie. we must use the ordinary “Goodness of Fit” test. 
This point is discussed in Appendix II, as there has recently been some mis- 
interpretation of the matter. 


* We have taken the series of those symmetrical fourfold tables for which ¢ has always the Mendelian 
value 1/3; the values of Q range from °6 to 1. What interpretation can association give of such 
Mendelian tables? 

+ The evil done by Mr Yule’s preaching of association to the neglect of more general methods is 
manifest in a recent paper by G. N. Collins in The American Naturalist, Vol. xuvi. p. 572. He gives 
such numbers as the following for flower colour and long pollen in hybrid sweet peas, taken from 
Bateson, Saunders and Punnett, Report III to the Evolution Committee, p. 9, 1906. The calculated 











| 


PURPLE Rep Waitt 
l < Oh nase 
Long Round | Long Round Long | Round | 
7 Mieco OMe keels Se eis SRR: ae 
=o 
Observed .. | 1528 106 117 | 381 1199 394 
Calculated ... | 1448-5 122°7 122-7 | 401°5 12205 | 407-4 | 


numbers are curious; the authors do not explain adequately how they have obtained them. Assuming 
them to be correct—but of this we have doubts—the problem proposed by Mr Collins is to determine 
whether the observed ‘‘ Purple” and ‘‘ Red” as distributed into “‘Long” and “ Round” are a random 
sample from the calculated values, i.e. we compare 








Observation m2 1528 | 106 117 381 
Theory ... ies 1448°5 122°7 122°7 | 4015 | 


Mr Collins remarks ‘‘No method has been proposed for making definite comparisons between such 
series of numbers” (p. 572), and continues ‘‘ A customary and direct method of comparing the degree 
of relationship that exists between any two characters is to compute the coefficient of correlation or 
Yule’s ‘ coefficient of association.’’’ In the discussion which follows Mr Yule’s “ coefficient of asso- 
ciation” (1900) is used. Considering the work of Pearson and Elderton on ‘‘ Criteria of Goodness 
of Fit” (Phil. Mag. Vol. u. 1900, pp. 157—175, and Biometrika, Vol. 1. pp. 155—163), Mr Collins can 
hardly have gone far in statistics, for how would he have proceeded had he included the ‘ White” 
in his series? The proper method appears to us the general one, i.e. to determine the probability 
P of the recorded divergence between observation and theory, calculating 

susie Cn - Sno? 

theory x 

and deducing P by Elderton’s Tables: see our Appendix II. In this case a deviation as large as that 
observed would only occur once in twenty-one trials or the odds are 20 to 1. But we believe Messrs 
Bateson and Punnett have done themselves injustice. We do not write this in disparagement of 
Mr Collins’ work; he is undoubtedly right in demanding some test for ‘‘ goodness of fit” in these 


luxuriant Mendelian formulae. 
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(9) On the Limitation in Value of the Boas-Yulean 9. 


Given two total variate frequencies, if we can assert nothing of the nature of 
the distribution, the maximum value of the uncorrected mean square contingency 
coefficient depends on the number of cells and cannot for a finite number of cells 
exceed a certain limit. Mr Yule has spoken of this fact as if it were a serious blot 
on the method of contingency. We do not agree with him, but it is singular that 
if he thinks so, he should not have rejected the use of ¢, the “theoretical value of 
the correlation.” The fact that @ had a maximum limit was known to Mr Yule*, 
yet he never throughout his paper refers to it as detrimental to his own “ theoretical 
value of the correlation.” Consider any table : 


ym, , NyMy 
y re a — oh ny 
NV 








nym, NyMg 
+ £ + xu | 
N N 


ny 





m Me N 


This is the most general form the fourfold can take for given 7, n., 7m, m.. We 
then have . 
o= aN |/nnym,m,. 


This will be a maximum of a positive kind when «# takes the largest possible 
value, i.e. when # is equal to the lesser of n,m,/N and n,m,/N. It will be a maximum 
nM, No Me 
and , 
N N 


lies between definite limits which may be most restricted. 








of negative kind when « is equal to the lesser of Thus ¢ always 


Consider the table 
2269+! 97261-x | 99530 








ll-wx 459+ 470 
2280 | 97720 | 100000 
Here the limits are given by a=11 and « =— 459 or we have the two tables 


2280 | 97250 99530 and 1810 | 97720 99530 








o| 470 470 470 0 470 








2280 | 97720 | 100000 2280 | 97720 | 100000 
These give ¢= 0106 and ¢ = — °4499. 


* Journal of R. S. Soe. 





Vol. uxxv. p. 604. 
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Mr Yule’s association coefficient is for the two cases 


Q=1 and Q=-1. 


In the first case the tetrachoric 7, is +10 and in the second tetracheric 
r,=—1°0, since h=2°0 and k=2°6*. As another illustration take 


5000 0 5000 





4772 | 228 | 5000 





9772 | 228 | 10000 


This table gives the least positive ¢ for the given marginal frequencies; and 
with these frequencies @ can never rise above the value + °1527 which it has for 
this table. But both Q and tetrachoric 7,= + 1:0. 


The minimum value of ¢ for the same total frequencies is given by the table 


4772 | 228 5000 





5000 | 0 5000 





9772 | 228 | 10000 


This has ¢ = —°1527, while Q = — 1 and tetrachoric r,=—1. In other words, 
¢ is restricted to lie between + °1527 and — ‘1527, while Q and tetrachoric 7 may 
pass through the whole range + 1°0 to-—1°0. Why has Mr Yule not pointed out 
these facts when recommending ¢ as obviously suitable for all fourfold tables when 
we get rid of normal variation? Clearly, if he had done so his criticism of 
uncorrected contingency would have been shown to apply with still greater force 
to his own coefficient. The standard deviations of the variates of the fourfold 
VETDOFD) y guy MOTD CEO) 


table 


are not given by A’ and the 


ad — be 


product moment by - 7 AA’, unless we may concentrate each variate into 


a\b 
e|d 


points at distances A and A’ which Mr Yule takes as units. But it is clear that 
when we have done this we (i) have fixed the standard deviations of the variates, 
(ii) can shift our dichotomic lines throughout the whole ranges of A and A’ with- 
out influencing the result. If the variates are really not concentrated into points 
their standard deviations are wholly independent of the dichotomic lines, and 
every shifting of those lines will change the proportions of each variate falling into 
the two categories. The independence of the standard deviations of the dichotomic 
lines is the advantage of the tetrachoric 7 over @; and in practical Mendelian 
statistics it is in most cases impossible to shift the dichotomic lines without 


* Actually h=1-999087 and k=2-597180. As we have already indicated and shall further emphasise 
in Appendix I, the value of tetrachoric r, is indeterminable by the usual method in such cases, 
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modifying the frequencies. This is the ground we have had for applying ¢ to 
theoretical but not to practical Mendelism *. 


A further illustration of this limitation of ¢ for given marginal frequencies of 
the fourfold is provided in the accompanying Diagram V (p. 217). Here the range of 
values possible for the Boas-Yulean is given for the special case where one variate 
has a median division, and the percentage at which the dichotomy of the other 
variate takes place is given on the horizontal line; for example, for a 10°/, dichotomy 
¢ must lie between + 3333. We do not ourselves lay stress on this limitation of 
the range of values in the Boas-Yulean, but if it be a defect of the coefficient 
of mean square contingency that for a fourfold table its value cannot exceed 
‘707, it is also a defect of the coefficient recommended by Mr Yule that it aiso has 
a limited range for given marginal frequencies, a limitation not shared by the 
tetrachoric coefficient or even Mr Yule’s coefficient of association. 


(10) The Coefficient of Contingency. 


We do not propose to take up at great length a defence of this coefficient 
because one of us has had for some years a memoir on the subject in hand which 
will soon see the light of day. But Mr Yule’s criticisms arise from two sources, 
(i) from his disregard of corrections which ractice has taught us were needful and 
which have been known for some time, (ii) from his obvious want of that con- 
fidence in the method which arises from long experience of its applicability. 


The corrections needed are (a) those due to number of cells, and (b) the 
correction for class-index. If « = number of rows, \ = number of columns, then on 
the average of many random samples the correction for number of cells is 


* We have the following results for the small-pox data : 


Possible range of @ 
Boas-Yulean ¢ for given frequencies 


Sheffield ide vet aa 531 +°9181 to —-°1221 
Leicester rn pea Ne *249 +°2806 to —-°2228 
Homerton-Fulham ... le *423 +°8101 to --2301 


How would Mr Yule compare these values of ¢ with each other or with those of r from continuous 
frequencies, which can range from —1 to +1, or again with a Boas-Yulean ¢ from such tables as 








499,200 | 800 | 500,000 499,988 | 12 | 500,000 

498,306 | 1694 | 500,000 499,987 | 13 | 500,000 

997,506 | 2494 | 1,000,000 999,975 | 25 | 1,000,000 
o= +02, $= +0003, 

Possible range +°05 to — ‘05, Possible range +°005 to — ‘005? 


The bulk of the mental defect and blindness data considered by Mr Yule has for ¢ total possible 
ranges varying from ‘4 to ‘6 on the positive side and ‘006 to ‘004 on the negative side. How can 
the resulting coefficients be intercomparable ? 
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(«—1)(X—1)/N to be subtracted from ¢*. This is the chief but is not the only 
correction for number of cells. It is, however, the one of most importance for our 
present purpose. It must only be applied when our material can be looked upon 
as a random sample. It should not be used of course when our material is an 
actual theoretical frequency surface, and not a random sample from such a surface. 
The second correction is for the use of class-indices in grouping. The theory of 
this correction is discussed in an earlier paper in this number of Biometrika 
(Vol. 1x. p. 116), where it has been detached from the memoir in preparation on 
contingency in order to indicate certain fallacies in Mr Yule’s statistical theories. 


Diagram V, Maximum and minimum values of ¢ (or 7;,) for a 50°/, division of one category and 
various percentage divisions of the other. 
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The area inside the curved figure contains all the possible values of ¢. 


Each variate must be corrected independently for the use of broad categories, by 
calculating the correlation of the variate with its class-index. In order to test the 
efficiency of the coefficient of contingency for a variety of groupings an arbitrary 
series of groups must first be selected to work upon. We choose the groupings of 
the eye-colour data published by Pearson and Lee for father and son as being 
perfectly arbitrary groupings fixed before any controversy arose on this subject 
Biometrika 1x 28 
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and therefore clearly not selected to obtain favourable or unfavourable results. 
These frequencies are 





| | | 2 4 s|e6l7 3 | Totals | 














} ] 
sae | . | fo Secs Le ey 
| | | | | | 
Father... 36 | 322 | 264 | 180 5 64 | 101 28 1000 | 
Son... 34 | 301 | 284 | 137 5 | 100 98 41 1000 





The reader should remember that the eye-colour groups are as follows: 


1 = light blue, 5 = light brown, 

2 = blue, dark blue, 6 = brown, 

3 = blue-green, hazel, 7 = dark brown, 

4 = dark grey, hazel, = very dark brown, black. 


Group 5 has been usually clubbed with group 6 because in 5 only five fathers 
and five sons occur. 


The following table gives the class-index correlations for various groupings : 
TABLE XI. 
Table of Class-Index Correlations *. 


| Class Correction | Class Correction | Corrective Divid- | 











| Order of Divisi 
Table | Porer Fathers Sons ing Factor | 
| | | 
Seis gS PREMISE GERI Eakgd sacle WS, ie 
| Ist 7x7 1:243:4:5:6:7:8 ‘9010 “9011 °8118,9110 | 
| Qnd 7x7 D823: 45046: 7:6 | "9624 | "9645 *9282,3480 
| Ist 6x6 1:243:4:54+6:7:8 “9009 ‘9010 ‘8117,1090 
| Qnd 6x6 1:2:3:4:546:74+8 | "9542 | “9553 *9115,4726 
Ist 5x5 1:24+344:546:7:8 | “8070 "8352 *6740,0640 
| Qnd 5x5 142:3:4:54+6:7+8 | “9259 “9296 |  °*8607,1664 
| Ist 4x4 | 1:24384+4:54+6:7+8 ‘7971 *8246 | °6572,8866 
Qnd 4x4 1+2:34+4:5+6:7+4+8 “9054 "9144 *8278,9776 
3rd 4x4 | 14+2:3:4:546474+8 ‘9156 “9130 *8359,4280 
Ist 3x3 | 1424+3:4:54+64+7+8 "8253 | “8191 *6760,0323 
Qnd3x3 | 142:34+4:54+64+74+8 *8949 | “8975 °8031,7275 
3rd 3x3 | 1:24+3444+5+4+647:8 *5669 5968 *3383,2592 | 





The reader should note that while the corrective dividing factor on the whole 
gets smaller and smaller as the classes get fewer and fewer, yet it is not possible 
to assert a priori that more classes will have a higher factor than fewer classes, 
e.g. our Ist 7 x 7 table has a lower corrective factor than our 3rd 4 x 4 table, and 
our 2nd 3x3 table a higher factor than our Ist 5x5 table. This point is 
important in reference to a criticism we shall make later of Mr Yule’s statistical 
methods. 


* The justification for obtaining the class-index correlation by determining the mean by the 
Gaussian hypothesis is given later in this paper; see also Biometrika, Vol. 1x. pp. 127, 139, etc. 
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Taking this system of classification we divided up a Gaussian surface of 
‘5 correlation into the same groups, and also a Gaussian surtace of ‘3 correlation. 
We publish these surfaces below,*. 


TABLE XII. 






































d | BR BS Sey 4 | 546 oe Totals 
1 7°38 19°85 | 4:94 1°38 0°26 o-18 | 0-01 34 | 
2 20°58 | 145°47 | 78°94 | 35°98 9°72 9°27 | 1:04 301 
3 6°01 93°63 | 85°41 | 54:34 | 18°59 | 22°33 | 3-69 284 
4 1:26 | 31°81 | 39:49 | 31°03 | 12:29 | 17:36 | 3°76 137 | 

5+6 053 | 1811 | 27°79 | 25°14 | 11°09 | 17°62 | 4°72 105 | 
7 0-22 | 11°02 | 21°59 | 23°66 | 11°86 | 21°89 | 7°76 98 | 
8 O02 | 211 | 5:84 8°47 5:19 | 12°35 | 7-02 41 | 
i 
Totals | 36 | 322 264 180 69 101 28 1000 | 
TABLE XIII. 


Gaussian Surface for r='3 in Eye-Colour Groupings. 




















1 2 3 | 4 546 | 7 | 8 Totals 
1 4-04 1716 | 7:55 | 3:30 | O91 | o-92 | O-12 34 | 
we. 17-41 | 12359 | 79°76 | 44-64 | 14°61 | 17°67 | 3°32 301 
| 3 8:86 | 93-00 | 78°31 | 52°04 | 19:20 | 26-40 | 619 284 
i | 4 2-83 | 37°73 | 37-24 | 27°51 | 10:95 | 16-31 | 4-43 137 
| ste | 1°62 | 25°21 | 27-75 | 22:09 | 9:26 | 14°64 | 4-43 105 
ae 1-02 19°50 | 24:47 | 21°39 | 9:58 | 16°36 | 5-68 98 
8 0-22 581 | 892 903 | 4:49 | 8:70 | 3-83 41 
| Totals | 36 | 322 | 264 | 180 | 69 | 101 28 1000 | 





In applying the method of contingency to these two tables, no correction for 
mean of ¢? in the random sample should be made; they are actual surfaces and not 
random samples from these surfaces. Further in order to measure what effect 
dealing only with round numbers in the cells would make we replaced the first 
table by the following Table XIV in which the decimals were cut off and a slight 
adjustment made to preserve the total variate frequencies. This table is pub- 
lished so that the reader can judge on what we worked. 


‘ 
This working to units cannot be expected to give quite as good a result as 
working to two decimal places, but it is more consonant with an actual table. 


Next we took Pearson’s Family Data cards and arranged 1000 cases of Father 
and Son, first in order of magnitude of Father’s stature, next in order of Son’s 


* Mr Yule states that he has divided up the -3 Gaussian surface in a somewhat similar manner, but 
he does not publish his table, and it is therefore impossible to test his results. We should like here 
to enter a protest against this procedure, which recurs in Mr Yule’s memoir, and throws an immense 
amount of unnecessary arithmetic on any one traversing Mr Yule’s arguments. 

28—2 
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TABLE XIV. 

Gaussian Surface for r=*'5 adjusted to give whole wnits in cells. 
1 | 2 | 3 | 5 | b+6 | 7 | 8 | Totals | 

1 7 sis Le Sete dae ide 34 
2 21 145 79 | 36 10 | 9 1 301 | 

3 6 | 94 85 | 54 19 22 4 284 

4 2 | a ae 31 ie ie es ote ea 137 
5+6 ma is | 98 | 2% 11 | 18 5 105 | 
(i a 11 22 24 12 22 7 98 | 
8 ‘ibe 2 6 8 5 | 13 7 41 | 
| Totals | 36 | 322 264 180 69 101 | 28 1000 | 


magnitude of stature and divided up the frequencies of Father’s and Son’s stature 
exactly in the eye-colour groups. The following contingency table resulted. 


TABLE XV. 
Stature of Father and Son in Eye-Colour Groups. 
Stature of Father. 

















i oo | 3 5 | &+6 | 8 | Totals | 
| | | 
dl] 3 4 eS ee Cee See Ss es 34 | 
MD) 2 23 154 84 26 2 ae ie 301 | 
= 3 s 87 75 66 22 | 24 2 284 | 
ea 1 29 36 37 14 14 6 137 | 
£ | 546 at 18 27 26 ul 18 5 105 | 
S| 7 ak Bae 26 19 “te ek ae 98 | 
S| 8 a eet, ae SOTA a ae Oe tee 41 | 
| | 
| Totals} 36 | 322 | 264 180 | 69 | 101 | 28 1000 | 





We have thus one table which for practical purposes is absolutely Gaussian, 
one Gaussian table modified to give units in the cells, and one table typical of 
what occurs in, perhaps, 9 out of 10 cases in every-day statistics. 


The next table gives the results obtained by the method of mean square 
contingency with the appropriate corrections. It will be seen that 3x3 tables 
give as good results as 7 x7 tables and the method is thus justified for Gaussian 
material and for the bulk of such tables as occur in statistical practice. 


It will be seen not only how closely the mean of the contingency values 
agrees with the product-moment value of the correlation, but how little the 
individual values differ from the mean. Worse cases may possibly be found by 
those who go to seek them by extreme divisions—we have taken them as they 
came, and feel convinced by a wide experience that contingency gives in practice 
remarkably satisfactory results. 


We now take a step further and ask, if we depart from the ordinary run of cases 
and pick out skew distributions, can we place equal reliance on the contingency 
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| 
Stature Gaussian Gaussian 
Order of Table Classes Grouped | Father and Son | Surface r=0°5 | as Surface r=0°3 | 
a — 
2nd 7 x7 1:2:3:4:546:7:8| “49 | “49 
2nd 6x6 1:2:3:4:546:74+8 “49 “48 30 
2nd 5x5 14+2:3:4:54+6:74+8 *52 | “48 3 
2nd 4x4 14+2:34+4:54+6:7+8 52 } “49 
3rd 4x4 14+2:3:4:54647+4+8 51 | "48 
2nd 3x3 1+2 :34+4 :54+6+7+8 51 “48 = 
Ist 3x3 14+24+3 :4:54+647+8 “52 51 | *30 
Mean by Contingency _- *5080* “4884+ | 3017 ¢ 
Product-Moment Value — | 5189§ “5000 *3000 
{Classes concentrated at 
| 7x7 Product Moment Gaussian means and, 5231 5023 +3005 
corrections used] { | } 














method to give results of practical value? By practical value we mean results 
within ‘05 of the true value of the correlation, for very small weight is given in 
practical statistics to deviations of less than this order. We cannot do better in 
answering this problem than by taking the very surfaces (some of which were 
originally selected by Pearson to illustrate extreme non-Gaussian material) which 
Mr Yule has gone out of his way to collect, for they are very far from random 
samples of average statistical experience. 


These cases are (i) a hypothetical Mendelian surface constructed by Pearson 
and noted by him as skew at the time, (ii) the barometric table for Laudale and 
Southampton, (iii) the ages of husband and wife, (iv) the length of ivy leaves in 
various stages of growth—all cases selected as tests by Mr Yule. 


The following Tables give the data as we have used them. It will be seen 
from this material (i) how wide is the divergence from Gaussian type and (ii) what 
a large range of diverse classifications have been used. 


There is here extreme deviation from the Gaussian type, the arrays have every 
variety of skewness from the J-shaped curve of the zero couplets to the normal 
symmetry of the four couplets arrays. The actual correlation as found by the 
product-moment method is }. 

Now in this case there is (i) no corrective factor for random sampling as the 
Table is a theoretical table and not a random sample from such a table, (ii) there 


is no correction for class-indices because the class-indices are the actual values. 


* The Ist 4x 4 and 1st 5x 5 were also worked out and gave respectively ‘5151 and °5174. 
+ The slightly greater divergence from the true value here was we believe due to the adjustment 
of the table to unit frequencies. 
t The 1st 6x6 and the Ist 5 x 5 tables were also worked out and gave ‘3049 and -3089 respectively. 
§ The original table of Father and Son with 1078 entries gave r=*5140; see Biometrika, Vol. 11. 
p- 378. 
“| Class-index corrections made: see Biometrika, Vol. 1x. p. 128. 
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We have ¢? = ‘1251373 and OC, = ‘3335 against actual r ='3333. <A better illustra- 
tion of the value of C, properly used could hardly be imagined. 
TABLE XVI. 


Mendelian Inheritance of Recessive Couplets on Pearson’s Theory*. 
Case of Four Couplets. 


Number of Recessive Couplets in Parent. 














x std au aD ee 

a | 0 ere. ike 3 A Totals 

% 

do 

ee 625 500 | 150 20 1 1296 

oq. oo 1 500 800 360 64 4 1728 

m= gel} 2 150 | 360 | 276 72 6 864 

-2e| 38 20 64 72 32 | 4 192 

3 o” P 1 4 6 ets 16 

ae | : 

tag | Totals | 1296 | 1728 | 864 | 192 | 16 4096 
| _ : ' 











We now take: 
TABLE XVII. 
4x 4 Table for Homotyposis in Ivy Leavest. 
First Leaf. 














a eee a | bes 
‘i | Under 6:95 | 6°95--10-95 | 10°95—14°95 | Over 14°95 otals | 
d | 
-} ) Under 6:95 3358 | 3133 452 | 41 6984 | 
a! (2) 695—10°95 3133 «| «(13566 7497 | 1124 25320 | 
2 | (3) 10:95—1495 452s 7497 | 10474 | 29889 21312 
S$ | (4) Over 14°95 4] 1124 2889 2330 6384 
2) ! 
Totals 6984 | 25320 | 21312 6384 60000 | 


the original table with use of Sheppard’s correction it is found to give ‘567. The 
Table is extremely skew, and owing to its method of construction—i.e. taking pairs 
of leaves out of 25 gathered from each of 100 plants—it is markedly lumpy. 
The massing into a 4 x 4 table has removed this lumpiness and the result probably 
represents the organic relation between leaves of ivy from the same spray better 
than the original table. We find: 

¢* = 353,307 when corrected for random sampling, C,=°51095, and the class- 
index correlation given by r°¢,¢ =*'870,404, which leads to °587 for the corrected 
contingency. No investigation has been made of the linearity of the regression, 
but it seems to us that the value of the correlation in its general sense may be 

* Phil. Trans. Vol. 203 A, p. 60. Here is another illustration of Mr Yule’s peculiar methods. This 
table takes some calculating, but Mr Yule does not give the table, so that his results might be verified. 
Further any reader of his memoir would suppose that Pearson had applied on this occasion or on 
other occasions tetrachoric r, to tables of this character, whereas the exact contrary is the fact, Pearson 


having been the first to apply product-moment formulae to theoretical Mendelian data, including this case. 
t Phil. Trans. Vol. 197 A, p, 351, 








(a one 





TABLE XVIII. 
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even better given by the ‘587 of the contingency method than by the ‘567 
of the product-moment method. Anyhow the difference is of no practical signi- 
ficance ; and we again see that contingency applies effectively to skew material. 
We next take into consideration a 7x7 table, the barometer data from 
Laudale and Southampton. 




















7x7 Table for Barometer Heights at Laudale and Southampton. 
Southampton. 
TREES Seber: xis Ss 
: > a Nf —a~s a a 
> 5 oH aR oe | Q | 3 
=| « Si | ft | +f | S&T | Sf Jrotals 
‘et Bias Bie Se: ei ee 
<a ° | 5 3 Ss 3 7 LS 
o 
31 (1) Over 3045 | 50 | 6375 | 925 | — aS es ae 
ZS! (2) 380°-45—30°05 | 42 503°25 | 248°5 64:5 6°25 gents ete 864°5 
$| (3) 30°05—29°75 | — | 1935 | 340 | 221-25 | 395 41°75 | 25 | 838°5 
ml (4) 2975-2955 |} — | 35 | 12025 | 169 | 45 72°75 | 15 457 
(546) 29°55—29°35 | — 45 495 | 117-75 | 63°25 | 77°25 | 14°75 | 327 
(7) 29°35—28-95 | — et 175 bt | «46 | 102-25 | 46°75 | 266-5 
(8) Below 28:95 | — - os 1 1 20 30° | 525 
| ( i- 
" | | 
Totals 92 | 800 | 778 | 6275 | 201 |314 |1095 | 2922 
| | | escae ic 











This is a singularly unfavourable table for contingency methods for it is a well- 
known rule in practical working to avoid cells whose actual frequencies or those 
of independent probabilities are zero or a few units. We should therefore anticipate 
getting the best results in such cases from few divisions in which cells with zero 


or small entries rarely occur. 


with the eye-colour data nomenclature. 














Order of bias | Mean Square | Southampton | 
Table — Goatees T2C, | 
; 7X7] 1:2:3:4:546:7:8 | “69791 ‘9667 | 
f 6x6 1:2:3:4:546:7+8 | *67755 “9570 
5x5 14+2:3:4:546:7+8 | *63210 "9345 
lst 4x4 142: 34+4:546:7+8 “60952 “9089 
2nd 4x4 14+2:3:4:54+647+8 | ‘61368 “9238 
Ist 3x3 14+2:34+4:54+647+4+8 | "58899 “8979 
Ind 3x3 | 1424+3:4:54+64+7+8 | 55020 "8450 
3rd 3x3 | Extra* ve eee | «= 57580 |  *8823 
4th 3x3 Extra* | *45500 } “7972 
| mes 
iF Data classified to *1” give Product Moment 7=-780t | 


Mth caw 08 


We have numbered the divisions to correspond 








Laudale ; 
ryCy -~ 
Fay, | al a 
9659 75 
9596 74 
9324 73 | 
9136 73 
9164 72 
8973 73 ; 
8215 79 
8736 75 
‘ 540 76 
wet amneees Cuct SR 
Mean | “744 | 
| 


* Two cases of this for 3 x 3-fold tables had already been worked out by Pearson in his paper in the 
earlier part of this number and are cited here ; see Biometrika, Vol. 1x. pp. 136—7. 


+ Given as °757 in the original paper, Phil, Trans. Vol. 190 A, p. 455, which antedated the publi- 





cation of the correct ‘ Sheppard’s ’ corrections. 











Age of Husband. 
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Here again for practical purposes the mean square contingency gives quite 
good results. It is very unlikely that in practical statistical work stress would be 
laid on a difference of ‘05 occurring in a correlation of this magnitude. 


We now turn to the table of ages of Husband and Wife taken from the Census 
of 1901. We have first arranged an 8 x 8-fold table in unequal frequency 


groups, as follows: 


TABLE XIX. 
Age of Wife. 






























































aq) | @) a | 4 ‘| ® | ® | @M |] ® 
15-20 | 2129 | 30-39 | 40—44 | 45—49 | 50-54 | 55-64 | 65— Totals 
(1) 15-24] 44112 | 194620| 4962| 162| 40/ 16| 8 1] 243991 
(2) 25—34] 15763 | 881320 | 590180 | 14046 | 2669 645 | 186 17 | 1504826 
(3) 35—44 873 | 98200 | 910238 | 388586 78648 13986 3362 240 | 1494133 
(4) 45—49 90 | 6584 | 85583 178474 | 251519 58699 12223 576 593748 
(5) 50—54 43 | 2573 | 27000 56976 | 146990 | 194509 53322 2027 483440 
(6) 55—59 29 1166 10275 | 17810 | 46192 | 110115 | 176506 7847 369940 
(7) 60—69 16 788 6570 | 10233 | 21314 | 50238 | 261167 | 102356 | 452682 
(8) 70— 7 184 1257 1780 3290 | 6759 36251 | 125302 174830 | 
Totals 60933 | 1185435 1636065 | 668067 | 550662 | 434967 | 543025 | 238366 | 5317520 | 
The following cases were investigated by contingency : 
Contingency Wif 
Onder of Division uncorrected r 4 , | — i 
Table for classes C,% | Cyy 
ae f af 
6x6 142:3:4:5:6:74+8 7821 9470 9346 “88 
5x5 14+2:3:4:54+6:7+8 “7677 9447 9327 "87 
4x4 14+2:3:44+5 :64+7+8 °7405 | 9312 "9223 "86 
3x3 142 :34+4+5 :64+7+8 6995 8959 "8954 | "87 
| | 
| 














The true value of the coefficient of correlation by product-moment method 
='925. These values in an extreme case of skewness, rendered more complex by 
heterogeneity of the material*, so far from discrediting the method of contingency 
are definitely in its favour. On material given by a 3 x 3-fold or 4 x 4-fold table 
no reasoning is likely to be based which would be modified by a change of ‘03 
to ‘05 in the correlation. We think that for even these exceptional cases—which 
rarely occur without warning in practical statistics—the corrected contingency 


will not lead any one astray. 


* Skewness of wife’s ages=-71; skewness of husband’s ages=-76. The heterogeneity depends 
on the fact that the table represents the ages of all married couples without regard to second or later 


marriages. 


+ Here are a few comparisons of results reached by the method of pseudo-ranks with those deduced 


by contingency : 
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What has Mr Yule to place against this method for tables 3 x 3-fold up to 
6 x 6-fold in classification ? He writes: “From several trials—more than are here 
given—I have come to the tentative conclusion that the best guide to the cor- 
relation that would be found for given data, if the grouping were other than that 
which in fact it is, is the correlation for the existing grouping, provided that you 
are given at least some five or six arrays” (loc. cit. p. 618). He does not venture 
to inform us what he would do for a table of 3 x 3, or 4x4 cells! As a matter 
of fact such tables can give even by Mr Yule’s method of pseudo-ranks better 
results than the 5 x 5 or 6x6 groupings. Mr Yule is, however, not content with 
his statement that his method is tentative; before he has done with it* he has 
assumed that by increasing his classes he will approach a limit which is the true 
product-moment correlation. As a matter of fact there is no such approach :t 
all; the Yulean method of pseudo-ranks may give a better result for a lower than 
a higher number of cells, and if it did go far enough to reach a limit, it would 



































0-5 Gaussian Correlation 
Order of Barometer Results Age of Husband and Wife | arranged in Father and Son 
Table. Eye-Colour Groups 
Number 
of —- a — — 
Classes Contingency ‘Peendo-Ranks Contingency | Pseudo-Ranks | Contingency | Pseudo-Ranks 
ees fo: “15 | 73 — —~ “49 “46 
6x6 “74 13 88 89 “48 45 
5x5 ‘73 | “72 87 ‘88 *48 44 
Ist 4x4 73 | 67 86 86 “49 42 
Qnd4 x4 12 sO 71 ae ae 48 43 
lst 3x3 ‘73 “66 91 *83 *48 “41 
2nd3 x3 ‘79 *65 *84 ‘81 51 “40 
3rd 3x3 ‘75 } 67 —— — _ — 
4th 3x3 “76 52+ ‘87 79 _ — 
| 
| 
Mean “744 674 “890 "845 488 *431 
True r ‘780 | -780 925 | 925 | 500 500 
| | 

















It will be seen that the method of contingency, especially with few classes, is markedly better than 
that of pseudo-ranks. We have purposely introduced the Age of Husband and Wife, because the 
divisions there have ranges not very diverse in magnitude. In such cases the method of pseudo-ranks 
becomes almost exactly that of the true correlation of variates, if the proper Sheppard's correction be 
made. In the case of Ages of Husband and Wife, if this correction be included, we are practically 
finding the true correlation by the method of psevdo-ranks. It is remarkable that Mr Yule has not 
drawn attention to this, because it at once indicates how fallacious the method is, if the subranges 
be unequal, 

* On the basis of what he starts by calling a tentative method, he then proceeds to assert that all 
the biometric pigmentation work is “‘ wholly untrustworthy ” (loc. cit. p. 622)—a characteristic illustra- 
tion of how Mr Yule’s mind rapidly grows obsessed by a theory which he has not properly investigated. 





+ This case is of remarkable interest as indicating the futility of Mr Yule’s method of pseudo-ranks, 
The table although 3 x 3-fold has sensibly equal ranges. Therefore the correction to pass from ranks to 
variates is closely Sheppard’s. The Yulean pseudo-ranks coefficient thus corrected is raised from ‘522 
to *775, close to the true correlation ! 
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give the correlation of ranks and not that of true variates*. The fallacy of 
Mr Yule’s arguments and the extreme inferiority of his method to that of con- 
tingency will be manifest in the following comparisons, We, however, draw the 
attention of the reader to this: that if the method of pseudo-ranks did approach 
a limit it would be the correlation of ranks uncorrected for huge brackets, i.e. we 
should still have to correct for passing from ranks to variates and for class-indices. 


Comparison of Method of Contingency with Method of Pseudo-Ranks. 


Coefficients deduced from Table XV, Stature of Father and Son. Product- 
Moment Correlation = ‘52. 








| l 
Order of | Sei Pearson’s Yule’s 
Table Nature of Divisions | Contingency | Pseudo-Ranks 
1 | 
7x7 2239743666 :7°28 “49 “48 
6x6 1:2:3:4:54+6:7+8 “49 “48 
Ist 5x5 | 1 :24+34+4:54+6:7:8 *5Q °37 
Qnd 5x5 14+2:3:4:54+6:7+8 52 47 
Ist 4x4 14+2:3:4:54+64+74+8 Bl “46 
2nd 4x4 1 :24+34+4:54+6:74+8 52 "36 
3rd 4x4 14+2:3:4:5464+7+8 51 “46 
lst 3x3 14+24+3 :4:5464+7+8 52 ‘37 
2nd 3x3 142 :34+4:5+64+7+8 *51 *44 














The inferiority of the method of pseudo-ranks will be obvious. The contingency 
gives as good results for a 3 x 3 table as for a 5 x 5 tablet; but for two different 
tables of the same order the method of pseudo-ranks will give results differing by 
as much as ‘10, ten times the difference of the contingency method. 


Here is another Tablet, to which Mr Yule has applied his method of reaching 
a limit to the actual correlation, namely that for eye-colour for pairs of brothers. 


Mr Yule having so to speak ingeniously “dressed the window” to show a 
falling correlation of pseudo-ranks with his increase of classes, then asserts that this 
pseudo-rank correlation approaches a limit below 0°28, and extending his fallacious 
reasoning holds that this limit of ranks is the limit to the true correlation of 


* The fog in Mr Yule’s mind on this subject is well illustvated by his table on p, 619. He takes 
a table for a Gaussian distribution of correlation 0°3 and says that with an infinite number of classes 
the Yulean coefficient would become 0°3. There is not a trace of any knowledge on his part that 
the limit of a process by which unit-range is given to each individual is not the same as that of a 
process by which wnit-area of the frequency curve is given to each individual. Here and elsewhere 
he makes no distinction between the correlation of variates and the correlation of ranks. In the actual 
case his limit would have been ‘2876 and not ‘3000, but far greater differences will arise if the 
material be skew. We return to this point later. 

+ Like all statistical methods, that of contingency must be used with due regard to the data to 
which it is applied and to the manner in which it is applied. Compound or heterogeneous material 
may give a contingency coefficient differing considerably from true correlation, and groupings of great 
inequality in the cells may render idle the corrective factor, e.g. if the great bulk of the material be 
placed in one or two cells. 
$ Phil, Trans. Vol. 195 A, p, 140. 
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TABLE XX. 
First Brother. 
1 | 2 esa ee | 546 | 7 | 8 | Totals 
5 1 1 | 3s 19 10 a 6 6 98 
a; # 38 404 205 53 65 | 41 27 833 | 
se. 19 | 205 | 418 97 56 78 28 901 | 
-Q 4 10 53 97 168 47 | 50 18 443 | 
a | 5+6 3 65 56) 47 70 | 42 | 14 297 
ai 7 6 41 78 | 50 | 42 2 ae 297 
Si. 2 6 27 28 | 18 14 | 8 | 30 131 
D | 
| Totals] 98 | 833 | 901 | 443 | 297 | 297 | 131 3000 | 
t i u 











variates. Now it is clear that there is no such general rule about the correlation 
of pseudo-ranks always moving in one direction. It is possible within certain 
limits to vary that correlation in an almost endless manner according to where 
we take our divisions, It 1s far more influenced by the size and position of our 
“brackets” than by whether we work with a 3 x 3-fold or a 7 x 7-fold classification. 
We can choose in this case a 3 x 3-fold table to give this spurious coefficient of 

Resemblance of Eye-Colour in Brothers. 

















Order of ete Pearson’s Yule’s 
Table, Nature of Divisions Contingency | Pseudo-Ranks 
7x7 1:2:3:4:546:7:8 “51 "29 

Ist 6x6 14+2:3:4:546:7:8 52 29 
2nd 6x6 1:2:3:4:54+6:7+8 49 29 
5x5 | 142:3:4:546:748 | 50 “30 
Ist 4x4 14+2:3:4:546474+8 | 51 33 
2nd 4x4 1424+3:4:54+6:74+8 | 53 27 
3rd 4x4 14+2:3+4:546:7+8 “44 | 27 
Ist 3x3 | 14243:4:5+4+64+74+8 D4 29 
Qnd 3x3 | 142:34+4:54+6+7+8 44 | 30 
8rd 3x3 | 142:3:4454+64+74+8 “BO “36 
eet Pa iano, es 2a tg ae oe _—- | 
Mean a3 “fs “498 >*28* 





Mr Yule any value from ‘19 to almost “40. This flows from the fact that the 
class-index correction may take a wide range of values according to the arrange- 
ment of the classes, and Mr Yule makes no allowance whatever for this fact}. 


* «Can we have any hesitation in similarly estimating the correlation for the eye-colour table, if we 
were in a position to adopt a finer and more uniform grouping (without assuming that we will compel 
that grouping to give us a normal distribution) as something slightly less than 0°28?” Yule, loc. cit. 
p. 619, 

+ It is not possible to correct the Yulean pseudo-ranks correlation (i) for passing from ranks to 
variates, because Mr Yule not only rejects any appeal to the Gaussian, for which we know the proper 
correction, but because his assumption of nit ranges precludes the use of that curve, nor (ii) for class- 
index correlation, because the same assumption hinders any rational method of finding the class-index 
correlations. 

29—2 
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We have seen that the corrected contingency gives a good practical approach 
to the actual correlation, even if the material be skew as in the Husband and 
Wife or in the Barometric Data; are we to argue from the entirely fallacious 
reasoning of Mr Yule that the method is in this case on the average 44°/, in 
error although in the skew Mendelian table it is only ‘06°/, in error and in 
the Husband and Wife data only 3:°8°/, in error? We shall need some far better 
reasons for believing that the value usually assumed for the correlation of eye- 
colour in brothers, i.e. circa ‘50, is in error, than such as Mr Yule seems able to 
adduce. 


Another piece of remarkable special pleading on the same lines is that 
provided by Mr Yule in the case of Pearson’s tables for parental heredity in coat- 
colour in horses*. Those tables are remarkable in their nature, because although 
16 classes are formed, there are practically no entries except in the three main 
groups Brown, Bay and Chestnut. Here is the frequency distribution for sires, 
where bl. = black, br. = brown, b. = bay, ch. = chestnut, ro. = roan, gr. = grey : 











ia | ] ] | 
br. | br./b. | b./br. | b. | b.jeh. had ch. | ch./ro.|r0./ch. | ro. rl 


bl. | bl./br. | br./bl. | 




















| 
{ 


| 
ae 
209! 0 19 oot 0 | 0 |362! 1 0 





Now Mr Yule arranges this in 11 classes: 


| 7 | 5 | 209 | 19 





61 | 0 | 362 | 1 | o | 0 | ¢ | 
and also in three classes, presumably as : 
| 0 | — | 221 | — | 10 | — | 360 | —]o] —]o| 


He then obtains sensibly the same values for the two groupings and speaks 
of the equality of the pseudo-correlation of ranks thus obtained as marking in 
some way a limit to the correlation of the variates! Naturally he would obtain 
almost identical values, because the whole calculation of products and moments 
turns ou the three dominating groups of brown, bay and chestnut and practically 
all he has done in his arrangements of three groups and eleven groups is to call 
his sub-range unity in one case and two in the other! 


There is, we believe, only one classification possible of these tables on the 
reasonable assumption that the amount of pigment forms a continuous variate; 
namely that which makes a 3 x 3-fold division between brown, bay and chestaut 


* Phil. Trans. Vol. 195 A, pp. 122 et seq. 

+ I still see no error in my original classification by amount of pigment; there are more melanin 
pigment granules in the brown hairs than in the bay, and,—if we disregard the black chestnuts, which 
are very rare among thoroughbreds as compared with hackneys,—more in the bay than in the chestnut; 
the latter colour depends more, and in some cases almost entirely, on diffused pigment. K. P. 


gr.[ro.| gr. 
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and either excludes the few roans and greys or, as done below, throws them into 
the lightest group. The threefold tables are then : 



































Sire. Sire. 
79 | 132 | 47 | 258 58 | 102 | 27 | 187 | 
3 | 105 | 496 | 132 | 363 =| 66 | 406 99 | 571 | 
S| 37 | 152 | 190 | 379 =| 2 | 121 146 | 2092 | 
o | 
221 | 710 | 369 | 1300 | | 149 | 629 | 272 | 1050 
Dam. Dam 
s2 | 73 23 | 178 63 | 83 23 | 169 | 
3 | 101 | 319 99 | 519 | s| 106 | 327 gs | 521 | 
S| 33 | 199 | 141 | 303 | =| 34 | 118 | 158 | 310 | 
é) Fy 
216 | 521 | 263 | 1000 | 208 | 528 | 269 | 1000 | 
| r- 72 EE a ( 

















And the corrected contingency coefficients are : 


Sire and Colt: °41, Dam and Colt: °45, 
Sire and Filly: -47, Dam and Filly: -46, 
Mean: °45, 


Mr Yule states that the correlation is in this case of the order ‘33. Are we 
again to assert that the mean of the contingency coefficients shows an error of 
‘12, or is 36°/, in excess of the true correlation, when as we have seen the error 
we find in even extremely skew distributions is of the order of 4°/, and under ? 


Mr Yule says that the correlation of his pseudo-ranks is ‘33. Well and good, 
then the correlation for variates would be about ‘35, probably more for skew 
variation, and the factor for correction of class-indices about ‘80, or the true 
correlation is of the order ‘35/80 =-44, which brings it strangely nearer to the 
value found by contingency than to Mr Yule’s }! 


It is quite true that the values obtained in the original memoir were higher 
than ‘45, but the theory of contingency was not then developed; when a fourfold 
table had to be formed there were very good reasons for dividing it in the way 
actually selected. In the first place the division between Chestnut and Bay was 
physiologically more reasonable than one between Brown and Bay, which would 
throw the Chestnuts into the Bays. In the next place it was the division nearest 
to the median and so liable to the least error from either random sampling or skew- 
ness. The other, the asymmetrical, divisions are less reasonable because they 
give one quadrant with only 2 to 3°/, of the total frequency in it, they divide 
parent and offspring differently, and mix in one or other case Bays with Chestnuts. 
Let us suppose with Mr Yule that these tables did show a correlation of } (which 
they certainly do not), then we fail to grasp why Mr Yule should not get his 
Mendelian 4 quite directly without elaborating an erroneous theory of pseudo- 
ranks and using 3 x 3-fold and 11 x 11-fold tables to show approach to a limit. 
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The Mendelian } should come at once by the simple division of the tables into 
Chestnut and not-Chestnut and it comes pretty closely indeed by taking only the 
Chestnut groups of the 8 x 3-fold tables, as indeed it must do if we remember the 
rarity with which a Chestnut x Chestnut gives other colours. Our four tables 















































become: 
Sire. Sire. 

Nc. | ©. [Totals N.-C. | C. Totals | 
= |N-C.] 742 179 | 921 >| N.C. | 632 | 126 758 | 
2/G...9 189 190 | 379 =1C..8 -146 146 292 | 
') | 

Totals} 931 | 369 | 1300 | | Totals 778 | 272 41050 | 

Dam. Dam 
| N.C.| C. | Totals | | N.-c.| ©. | Totals | 
































=| nc] 575 | 122 | 697 | +! nc] 579 | 1 | 690 | 
O} C....f 162 | 141 303 | mm | C.... 4 152 158 | 310 

| | | 

| Totals} 737 | 263 | 1000 | | Totals} 731 | 269 | 1000 








The pseudo-rank correlations, ¢, are : 
Sire and Colt: “3094 
Sire and Filly: ‘3414 
Dam and Colt: ‘3030 
Dam and Filly: 3638 
Mean = "3294 
Could a better demonstration of the Mendelian } correlation be possible ? 


Now let us look at a similar arrangement of other data in which the true 
correlation is actually known. We take the following cases with approximately 
similar total frequencies : 



































From Table XV. From Table XIV. 

Stature of Father. Gaussian Surface for °5. 
=| eG Pee | =" 
Z I—4 Totals | | 1—4 5—8 Totals | 
a 
a tae 659 97 756 | 1—s 658 | 98 756 

~ | t 
g| 58 | 18 | 101 244 | 5—8 | 144 | 100 244 
& 
ca | 
S$ | Totals 802 198 1000 | Totals 802 198 1000 | 
DD ES Se ee a - ae = 
Actual Correlation 518. Actual Correlation 500. 


Yulean @ ove S06: Yulean «oo |= 0. 
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From Table XIII. 


Gaussian Surface for °3. 




















| “| i434 | &8 Totals | 
iS 63497 | 121-03 756 | 
5—8 167:03 | 76°97 244 | 
| Totals | 802 | 198 1000 | 


‘Actual Correlation “300. 
Yulean sae). “Ree 

In other words for any surface approaching the Gaussian the Yulean ¢ for a 
fourfold table with such frequencies in the total columns must be raised by ‘20 in 
the scale of correlation. Let us consider the same sort of tables in non-Gaussian 


material, material of Mr Yule’s own choosing; we have the following fourfold 
tables : 
From Table XIX. 


Age of Wife. 






































< a b—f Totals | 
we oS | | 
og 
ok | a 1,135,815 612,932 | 1,748,747 
bee | if 110,553 | 3,458,220 | 3,568,773 
fr | 
Totals | 1,246,368 | 4,071,152 | 5,317,520 | 
Actual Correlation -925. 
Yulean = 
From Table XVI. Heredity of Mendelian Couplets. 
Father. Father. 
aA O | 1—4 | Totals 0—1 | 2-4 Totals 
of | | op | 
I | 0 625 671 1296 | I | 0-1 2425 599 3024 
| 1—s 671. | 2129 | 2800 s.| 2-4 599 473 | 1072 
fe | ee | | 
© | Totals | 1296 | 2800 | 4096 ©] Totals} 3024 | 1072 | 4096 




















Actual Correlation °333. Actual Correlation 333. 









































Yulean o. =°243. Yulean eee °243. 
From Table XVIII. Barometer Heights. , 
Southampton. Southampton. 
| i—} | 08 Totals 1—2 | 3—8 | Totals | 
S| : . | 
S| 1-4 | 205325 | 22975 | 2976 | 1-2 | 659 | 3215] 9805 
> Se 244°25 401°75 646 3 3—8 233 =| 1708°5 J 1941-5 
S| os} | 
| | coal = 
Totals 2297°5 | 624°5 2922 | Totals 892 2030 2922 | 
Actual Correlation -780. Actual Correlation 780. 
Yulean 531, Yulean “566. 





Second Leaf. 
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From Table XVII. Ivy Leaf Length. Whorls of Woodruff *. 

















First Leaf. Members of First Whorl. 
| are 6:95 | Over 6:95 | Totals | % i Under 8 8| 8 and over Totals _ 
2 | ) 
Under 6°95 3358 3626 6984 5 & | Unders 7268 | 2281 9549 
Over 6:95 3626 49390] 53016 | Bq fhe over} 2281 | 1400 | 3681 
=| 
v°o : 
| Totals 6084 | 53016 | 60000 =e | Totals | 9549 3681 | 13230 
Mw eS, sa ee, Ae ae pega TM \_ 3 pe AR ee ale 
Actual Correlation ‘567. Actual Correlation 173. 
Yulean i” OE. Yulean cae ERs 
Now let us put together these results in a Table: 
| | 
Actual Yulean for | Percentage 
Correlation Fourfold | — Increase on p 
ER CEP RS, 
"925 686 239 35 
*780 “566 214 38 
*780 “631 | *249 47 
“567 “412 | *155 38 
*518 *308 *210 68 | 
| “500 “B02 | ‘198 66 } 
*333 "243 “090 37 
| +300 ‘167 "133 80 
| ‘173 141 | “032 23 


















































Now these results bring out the important point that whether the distribution 
of the frequency be Gaussian or not, we may have to add anything from 23 °/, up 
to 80°/,+ to the value of ¢ as found from a fourfold table to obtain the true 
correlation. For values about ‘33 we may have to add anything from 37°/, to 
80°/,. For the tables for coat-colour in horses we must add to the Yulean at a 
moderate estimate something like 40°/,, which gives a value, not near the ‘33 of 
Mr Yule, but near our “46 and close to the value found by contingency or closer 
to the values originally assigned by the tetrachoric r, method. 


If Mr Yule continues to assert that the true value of the correlation of these 
tables is }, then he may as well argue from the ¢ of the fourfold tables obtained 


* Selected as a case of irregular and skew correlation of a low product-moment value ; 
Trans. Vol. 197 A, p. 325. 

+ One Ivy Leaf distribution as below actually gave Yulean ¢=‘278 against actual correlation 
‘567, requiring a 1049/, increase on the Yulean! 


see Phil. 




















First Leaf. 
a 9 Bede 1495 | 3 Over 195 ae Totals | 
4 Under 14:95 49562 4054 53616 
S | Over 14:95 4054 2330 6384 
D | Totals 53616 6384 60000 
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by taking as classes Chestnut and non-Chestnut. But in that case his only 
logical standpoint is to assert that inside the Chestnut and outside the Chestnut 
there are no hereditary differences of pigment, that Chestnut is a unit character 
that differs by a unit from all other shades in Bay and Brown. This, however, is 
not a fact as a very little microscopic examination of horse hair would show him. 
Pearson’s tables themselves indicate that excluding Chestnut, there is correlation 
of intensity of pigmentation between parent and offspring ; further unpublished 
material indicates that within the Chestnut and for different shades of it the like 
relation holds. If there be a correlation between parent and offspring inside and 
outside Chestnut the value for the fourfold tables in the Chestnut and non-Chestnut 
unit classes must be a minimum and not a maximum value as Mr Yule asserts, and 
his criticism collapses with the fallacies on which he has constructed it. 


The chief of these fallacies is the principle that stress of some kind can be laid 
on the Yulean, or pseudo-rank coefficient proceeding to a limit; the fact is that it 
can be made anything we please by a suitable choice of divisions. The divisions 
which for a given number of cells give it a maximum value are those which make 
the sub-range frequencies of the two variates equal. Every deviation from this 
equality lessens the value of the Yulean, whether the deviation consists in heaping 
up the frequency at one or both ends or in the middle of the variate range. 


In the following table are a few results for the pseudo-ranks coefficient for 
3 x 3-fold divisions of the Husband and Wife Table (see our p. 224). They fully 
substantiate the view that it can be made to take almost any value by a proper 
choice of grouping in the scale classes : 


Yulean | 





Wife Groups Husband Groups Coefficient 
le 2:3—8 1-6: 228 184 
| 1—6: ye As 2:3—8 254 
| 1-8: 4:5—8 1:2—7:8 268 
1:2—7:8 1—2 :3—4:5-—8 | ‘297 
1:2—7:8 1:2—7:8 B51 
1—4: 5 : 6—8 1—4: 5: 6—8 605 
1—2 :3—4 :5—8 1—2 : 3-4: 5—8 809 


| 
| 


The whole fallacy becomes at once obvious ir the light of the class-index 
correlation correction, for even with a 6x6 or 7x7 table wide changes may 
be made owing to changes in the classification influencing the class-index 
correlation*. 


Mr Yule has selected the following Series A for Brother-Brother’s eye-colour 
in order to show that ¢ decreases to a limit of less than ‘28. But why should 
he not have taken Series B in order to show that it increases to a value greater 
than ‘28 ? 

* Cf. Table XI, p. 218. 
Biometrika 1x 
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Mr Yule accentuating Series A writes (loc. cit. p. 619) “The result emphasises 
the entire non-normality of the eye-colour table. For the normal distribution 











| } 
Order of | Series 4 Yulean veageretel | Yulean 
Table Divisions Divisions | 
| | 
— - we Rk teslire, gee! 
2x2 | 14243 :44+5464+7+8 34 14+24+34+4+5+6 :74+8 16 
| 3x8 14+2:3:4+54647+8 36 14+24+3444+54+6:7:8 | ‘17 
| 4x4 142:3:4:5464+7+8 33 1424+3+4:54+6:7:8 20 
5x5 14+2:3:4:5+6:7+8 30 1+24+34+4:5:6:7:8 21 
| 6x6 1:2:3:4:546:74+8 “30 14+24+3:4:5:6:7:8 2 | 
Lise y 1:23:3:4:54+6:;7:8 “29 14+2:3:4:5:6:7:8 28 
8x8 1:93:33 4:8:6:7:-8 28 233 7S:3425 38: 726 "28 





the correlation gradually increases towards the known true value as the number 
of arrays is increased: with five or eight arrays, notwithstanding the extreme 
irregularity of the grouping, we have the same moderately good approximation 
to the correlation as is given by the coefficient of contingency in this case*. 
For the eye-colour table the correlation decreases as the number of arrays is 
increased,” 


The fact is we can select series of table divisions for some of which the 
Yuleans go up, for others they go down, and for still others first go up and then 
go down. That any series must ultimately reach ‘28 is obvious, because that is 
the value of the only possible 8 x 8 table, but a 9x9 table might equally well 
show ‘24, and then we suppose that taking the down series Mr Yule would have 
asserted the limit to be ‘24. But suppose Mr Yule’s data had stopped at a 
5x5 table, then according to the nature of that 5x 5 classification, Mr Yule 
might have found ‘21, or 33+ as his limit, for all his tables of lower order must 
have gone up or down to those limits! 


Mr Yule’s method if indefinitely continued would lead him to the correlation 
of ranks, but what relation this would have to the correlation of variates in cases 
of markedly skew variation with emphasised deviation from linearity no one at 
present is in a position to say. Some light, however, can be thrown on it by 
considering the actual deviations produced in calculating means by aid of Mr Yule’s 
hypothesis which places unit distance between each group of individuals, and 
ultimately between each individualt. Let d be the range Mr Yule assumes 
between each individual of a population n and R=(n—1)d the total range. Then 


* In this case, namely the Gaussian for ‘3 correlation, the Yulean gives ‘26 for both 5x5 and 8x8 
tables, the corrected contingency gives -31 and ‘30 for these tables; this is Mr Yule’s idea of the 
**same moderately good approximation” ! 

+ Using the division 1 :2:3:4+5+6+47:8. 

} It is remarkable at this stage of statistical progress to find any one so incapable of appreciating 
Galton’s work on first and second prizes as to replace the equal areas occupied by individuals by equal 
ranges ! 
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it is obvious that Mr Yule’s mean character will always coincide with his median, 
or his 


M = mean = character of individual with minimum value of variate + 4 R. 


* = (standard deviation) = fe (1 + 3) R 


or, if n be at all large, 
o = R/(2 V3). 


It will be clear that for large numbers 


maximum value — minimum value of variate 
C= — 
2/3 


’ 


a relation entirely opposed to the practical independence of range and standard 
deviation in variates with which we are familiar*. But another difficulty at once 
arises ; in actual practice the subranges have all sorts of different values, and we 
may know one or more of them. Which one of these subranges is to be taken as 
the standard unit and have the range expressed in terms of it? Some numerical 
illustrations will emphasise the extraordinary difficulties, not to say contra- 
dictions, of Mr Yule’s process of treating subranges as equal units in determining 
correlation. 


For example, the Registrar-General gives ages of Husband and Wife from 15 
to about 100. Hence by Mr Yule’s method : 


Mean Age of Husband = 575 years, 
Mean Age of Wife = 57°5 years, 
Standard Deviation, Husband = 18°7639 years, 
Standard Deviation, Wife = 18°7639 years. 


The actual values are: 


Mean Age of Husband = 42°8306 years, 
Mean Age of Wife = 40°5838 years, 
Standard Deviation, Husband = 13:0649 years, 
Standard Deviation, Wife = 126813 years. 


Thus the Yulean values may differ by 30°/, to 50 °/, from the true values. 


To assume that the skew distribution is Gaussian will give much better results 
than this. Let us illustrate it on the very skew Husband and Wife, Barometer, 
and Ivy Leaf data. In dealing with actual data, we have to express the means of 
a variety of arrays in terms of a known subrange common to them all, e.g., bay 
colour in horses or hazel eyes in men. We will apply in succession Mr Yule’s 
hypothesis and the normal curve to the above data. 


* For such frequencies as occur in practice it is much safer to take the range about 6 times rather 
than 3°5 times the standard deviation. 


30 —2 
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Ages of Husbands in the Eight Groups of Table XTX. 


| 


True Mean Age of Husband 42°8306. 


By YuLean 


Value Deviation 
| 








By Gaussian 
Found from 
Group age oe yas Be SOL Te ORT 
Value Deviation 
meal wianaced GPmeaoees, ic 
2 38°57 | -4°26 ° 
3 | 41°14 -169 | 
4 40°48 —2°35 | 
5 40°20 —2°63 
6 40°13 —2°7 | 
fi | 41°91 —0°92 | 
Mean se | — | 2°43 
From whole 
Range ig | 


46°32 + 3:49 
46°32 | + 3°49 
45°66 + 2°83 
45°66 + 2°83 
45°66 | + 2°83 
31°32 —11°51 
— | 4:49 
57°50 | +14°67 





In five cases out of six the Gaussian gives better results than the Yulean in 
the simple matter of finding means even for such a skew distribution as those of 


ages of Husband and Wife. 


Now let us turn to the Ivy Leaves (Table of Breadth) * : 


Groups 
5— 8:95 


10°95—12°95 


Mean a a9 
From whole Range 


By Gaussian 


Value | Deviation 
| 


11°83 —1°39 
12°93 —0°29 
12°60 — 0°62 
12°57 | -—0°65 
12°87 | -—0°34 
— 0°66 











By YuLean 
Value | Deviation 
1418 | +0°96 
11°06 | —2°15 
11°06 | —2°15 
11°06 | —2°15 
11°18 — 2°04 

— 1°89 
19°95 | +6°74 





It will be seen that with one exception the true mean breadth (13'2148 for this 
very skew distribution of ivy leaves) would be substantially better found by using 
the Gaussian than by Mr Yule’s assumptions. 

Taking the barometric height at Southampton we have, noting that the actual 
mean height is 29°9814, the results given in the table on the following page. 


In every case the Gaussian, we see, gives markedly better results than Mr Yule’s 


method. 


We think it safe to conclude that the Gaussian can be used to give quite a 
good approach to the means of variates classed in “broad categories”; it is far 


* Phil. Trans. Vol. 197 A, p. 352. Breadth instead of Length taken for that character appears still 


more skew. 
+ Phil. Trans. Vol. 190 A, p. 428. 





a 
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more adequate than the Yulean pseudo-ranks method. It must give better results 
than a process which concentrates at unit-distances and renders any attempt at 
class-index correction impossible. 








By Gausstan By Yunean | 
Groups —__—___—- ae ee 2 _ 
l 
| Value | Deviation Value | Deviation | 
| | 
29°5 | 380°036 + °055 29°695 | —-:287 
29°6 to 29°9 =| 30°006 | +025 307128 | +4°147 | 
30°0 to 30°1 29°998 | +°016 30°039 + °058 
30°2 29°997 | +°016 30°095 +°113 | 
80'S to 305 30007 || +:026 | 29-784 | — "198 | 
| 
Oa aes Deere ae eee ee foe 
From whoie Range -- — | 29°750 | —-231 














It may be said that Mr Yule has not used the Yulean to find means; in 
appearance perhaps not; in actuality he certainly has, for all product-moment 
processes reduce in actuality to finding the means of arrays. In fact: 

__ S(nayry)/N — ey 
r= em 
_ S(n,2Gz)/N — ®Y 
oe Oxzoy Sete 

Here « in the summation term should be really the mean a of the individuals 
in the class n,. Thus in finding r we actually use the means % and 7 of the two 
variates, the means %, of the y variate for all individuals in the class n, of a’s, and 
the mean « of all the individuals in the class n,. Mr Yule’s method, as we have 
just seen, must lead to big errors in all these means; it also as we have seen leads 
to big errors in o, and o,. Hence if r comes out near the true value, this can 
only arise from a compensation of errors, the exact measure of which has so far 
only been determined for the case of a Gaussian distribution. 


(11) The Hye-Colour Data. 


We now turn to the eye-colour data for parent and offspring where we think 
Mr Yule has been led into precisely the same fallacies by his method of pseudo- 
ranks as in the coat-colour of horses. The colour shades recognised by Francis 
Galton in his inquiries were : 


1, Light Blue. 5. Light Brown. 

2. Blue, Dark Blue. 6. Brown, 

3. Grey, Blue-Green, 7. Dark Brown. 

4. Dark Grey, Hazel. 8. Very Dark Brown, Black. 


The book of data presented by Francis Galton to Pearson in 1899 contains 
under these entries a record of each family, grandparents, parents, offspring, uncles 
and aunts. No other data concerning the family were provided ;—only quite 
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recently have the original records come into possession of the Eugenics Laboratory 
and we now find they contain the important desideratum of age. It is proposed 
shortly to revise these tables of eye-colour, paying regard to the increasing pigmen- 
tation of the eye in extreme infancy and the decreasing pigmentation in extreme 
age. The inclusion of all cases, as no age was provided, has possibly something 
to do with the admitted irregularity of the tables. But does this irregularity 
invalidate the main conclusion drawn from the tables, i.e. that eye-colour pigment 
is inherited at the same rate as the measurable physical characters with a 
correlation lying somewhere between ‘46 and ‘50? The divisions made originally 
in the tables were selected purposely and with very definite ends, namely, 
(i) to give as good a physiological difference between the two groups as possible, 
(ii) to put into the same class the same eyes of both parents and offspring, and 
(iii) to get the least probable error by taking the divisions as near the median 
as was practically realisable. Mr Yule disregards the importance of (ii) and (iii) 
and gives the values of the tetrachoric 7, found from a number of divisions, 
several uf which have so few individuals in the quadrant that they are extremely 
untrustworthy. At the time the data were dealt with there was very little avail- 
able knowledge as to the distinction between a blue and grey eye. They were 
put into the same class, because it was considered that the total pigmentation of the 
iris of the grey eye was more akin to that of the blue eye, than to that of eyes like 
hazel with some macroscopic anterior pigment. The difference between the blue 
and the grey eye was considered to be one of structure rather than of pigment. 
With what we know of eyes now, we are not prepared to accept the Mendelian 
classification of eyes into those without and those with anterior pigmentation of 
the iris. If such a classification were absolutely legitimate, then grey eyes ought to 
be put with non-blue or with blue according to whether they possess such pigment. 
On such a classification as we have indicated the ¢ for inheritance of eye-colour 
between parent and offspring ought to be 4. Accordingly to test the position for 
the grey eye we ought to consider whether the correlations come out more nearly 
4 with the greys put with the browns, or put with the blues, ie. whether ¢ for 
a fourfold table is nearer } for the division 1+2:3+4+5+6+7+8 or for 
14+24+3:445+4+6+4+7+8. There cannot be a moment’s hesitation as to which 
is the more strictly Mendelian division. We find: 


Values of Boas-Yulean ©. 








, Blue only, Blue and Grey, 
Pair ie.1+2 | ie. 14+2+438 
Father and Son ... +e 33 37 
Father and Daughter ... | *22 28 
Mother and Son ... pics "28 "32 
Mother and Daughter ... | 24 “B34 
nico | Pea 
Mean 
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There seems no doubt that from the Mendelian standpoint, if anterior pigment 
be taken as a “unit,” then pure grey eyes should be included with the blues and 
the correlation then comes out the true Mendelian third. Thus the original 
division of the tables between grey and the hazel groups, which appeared at the 
time the most reasonable physiologically and statistically, is amply justified by the 
theory of posterior and anterior pigment. 


Now Mr Yule tells us on the basis of his erroneous theory of pseudo-ranks 
that “the average estimated correlation” of these eye-colour “ tables is something 
like 4 not 4” (loc. cit. p. 620). We have indicated earlier in this paper (p. 232), 
thet if a fourfold table for continuous variates—skew or Gaussian—give by ¢ a 


Diacram VI. Regression of eye-colour with eye-colour in brothers, the grade intervals being 
assumed to give a normal distribution. Inset, the same, colours 4 to 8 only, the grade 
intervals being of same value as before, to show resemblance within the darker grades only. 
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correlation of ‘33 this must be increased by something like 40°/,, if we wish to 
find the true correlation of the variates. Had Mr Yule asserted that eye-pigmen- 
tation was not a continuous variate, that all brown and hazel eyes were alike and 
differed from blue and grey by the possession of some mysterious unit character, 
then he would have been justified by the mere fourfold tables in asserting that the 
correlation was 4. In doing this he would have taken up the simple Mendelian 
position. That he did not do so was probably owing to the fact that he recognised, 
as the tables indeed show, that within the two groups, 1+2+3 the blue-grey 
group and 4+5+6+7+8 the hazel-brown group, there was distinct heredity of 
sub-divisions. The father with hazel eyes has an offspring less pigmented than the 
father with dark brown eyes. That such heredity exists must dispose at once of a 
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correlation of 4 as representing the heredity of eye-colour in man. And here it 
may be well to point out a very distinct difference between the problem which 
Pearson set himself and the problem for which the Mendelians profess to provide 
a solution—both speak of the heredity of eye-colour—but in no sense deal with 
the same problem*. The Mendelian says: “I treat the heredity of the character, 
presence or absence of anterior pigment.” The correlation in this case should 
be 4. This problem, however, is not that of the heredity of the various grades of 
pigment in the iris. There is very little doubt that grade of pigment is a perfectly 
continuous variate. The pigment of the iris is not solely confined to anterior and 
posterior faces, and quite different grades of pigment can occur in both these 


Dracram VII. Regression of eye-colour of brothers, the grade intervals being supposed equal. 
Inset, the same, coloucs 4 to 8 only, to illustrate the reduction in the regression—which 
is here the correlation—produced on the pseudo-rank hypothesis. 
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situations, until you get in the albinotic eye almost a complete absence of pigment. 
The accompanying diagrams, which have been drawn by two different methods, 
show how the heredity of eye pigment holds as well inside the blue-grey group as 
inside the hazel-brown group. They have been obtained in two ways, (i) by the 
use of the Gaussian to obtain the means of the arrays, and the relative ranges of 


* Even in the treatment of this problem we have only the papers of Hurst and Davenport which 
have been assumed to confirm each other. As a matter of fact on Hurst’s postulates as to the methods 
of observation, Davenport ought to have reached results discordant with Mendelism instead of con- 
firmatory! For the nature of these authors’ work on similar problems, see p. 209 above and Biometrika, 
Vol. vir. p. 403 and Vol. vii. pp. 269, 271, 272. 
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the eye-colour, (ii) by Mr Yule’s method of unit groupings to indicate that even 
that method shows the same result, the inheritance of the intensity of pigment 
inside the two main groups*. In an inset figure we have given, by the same 
methods, the increasing pigmentation of the second brother as the pigmentation 
of the first brother increases, when we remove all the blue-grey group from con- 
sideration. The general weakening of the correlation produced by using the 
method of pseudo-ranks will be obvious if Diagrams VI and VII are compared. 
The later diagrams, Nos. VIII and IX, show precisely the same point for the 
parental eye-colour data. It will be clear that if 4 were the limit to the corre- 
lation of these eye-colour tables, then all the correlation within the blue-grey and 
hazel-brown groups ought to have disappeared. Instead of this we find that the 
3 is only the correlation on the Mendelian hypothesis that there is a unit difference 
between individuals in the one group and individuals in the other; whatever 
this “unit difference” may refer to, it does not refer to a quantitative difference 
in pigmentation, because there is correlation within the groups. 


We have then the following results for fourfold parental eye-colour tables, the 
division being made between marked and less marked anterior pigment : 




















Variates supposed Variates supposed to be 
Classes to be discrete. continuous and Gaussian, 
Yulean ¢ Tetrachoric r; 
| 
ee 
Father and Son ... be taal “37 “DD 
Father and Daughter ... | 28 “44 
Mother and Son ... cele "32 "48 
Mother and Daughter ... | "34 ‘D1 
| 
INS Saiieee | — aes 
Mean ie ee "33 “49 








Mr Yule’s statement that the correlation is “something like 4, not $” amounts 
to the denial that the variate is continuous; and the slightest inspection of our 
diagrams shows that the variate is continuous; the question therefore turns on 
whether the Gaussian assumption gives a reasonable approximation to the influence 
of this continuity in increasing the correlation. We have shown that the fourfold 
table result (d) obtained by treating the variate as discrete requires, whether the 
distribution be skew or Gaussian, to be increased by amounts ranging from 37 °/, to 
80 °/,, when the value of ¢ is ‘3 or upwards. We have accordingly little hesitation 
in asserting that the true correlation exceeds } by at least 40°/,. Another way 
of approaching the problem is the one adopted by Pearson, when he found the 
want of stability in the tetrachoric r, as applied to these eye-colour tables, the 
method of mean square contingency. 


* Note how the slope of this regression line is *5 for the main portion of Diagram VI. 
Biometrika 1x 31 
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Making the proper corrections we find : 









































Ordet of Father Father Mother | Mother 

T ri - Nature of Classification and and and and 
—_ Son |Daughter| Son sieinccand 
| at 
7X7 1:2:3:4:546:7:8 ‘b7 “48 50 | 43 
6x6 | 1:2:3:4:546:748 58 50 48 | “44 | 

5x5 142:3:4:546:7+4+8 *bD “oe 4} 50 | “44 
Ist 4x4 | 142:344:546:7+8 | 53 50 | 45 | -41 | 
2nd 4x 4 142:3:4:54+6+7+8 | ‘54 56 | = *50 : Se 
Ist 3x3 14+2:34+4:5464+7+8 | *52 45 | 45 | : : ee 
Qnd 3x3 | 14243:4:546+7+8 | 60 51 56 «| «| “50 | 
| aes Fag aoa ai 
Mean* | - ‘BD “50 49 «| 744 | 
| 
| |— 
| Values originally given by Pearsont | 55 “44 ‘48 | 51 | 








That the eye-colour tables present anomalies has been fully admitted, but the 
average mean square contingency based on 28 groupings of these four tables gives 
the mean value -4961; the mean given by Pearson originally from what he con- 
sidered and we still consider the natural fourfold division was ‘4947. We have 
no experience whatever of the average of a large number of corrected and wholly 
@ priort unselected contingencies giving a result too high by 33 °/, of its value! 
On the contrary, as we have shown on pp. 221—226 the contingency method 
appears on the average to give values slightly in defect of the true correlation. 

A study of our diagrams shows that it is in the very small groups 1 and 8 
containing only 25°/, to 45°/, of parents that the marked deviations from 
linearity of the regression occur, that is to say in the light blue-eyed and very 
dark brown parents’ offspring. We suggest here that a considerable number of 
the light blue parents may have been erroneously classified, on account of extreme 
age, and a considerable number of the offspring of very dark-eyed parents may 
have been classified as light blue because of extreme infancy. Both may reaily 
belong to the mediocre classes. It is only in the extreme classes of small fre- 
quency that the effect of this shifting of the mediocre would be sensible. The 
Diagrams IX and X (pp. 248 and 250) show that in the bulk of cases, from group 
2 to 6 or even to 7, the regression line has a slope of over ‘5; the inset figures 
show that this regression is maintained inside the brown group when we reniove 
the blues. We have already given similar diagrams for the Brother-Brother table. 
These again show that in the centre of the figure the slope of the regression line 
is at least ‘5, while the defect at the tails very probably indicates the results of 

* Calculated from the corrected contingency to four decimal places. 
+ Phil. Trans. Vol. 195 A, p. 106. The division taken is that which for the use of a single division 
only is physiologically and statistically the most reasonable ; see above, p. 238. 


t+ The Diagrams VI and IX were obtained by applying a normal curve to each array to obtain 


the position of the mean on the assumption that the range of groups 3 and 4 remains the same on 
the scale. 





Waris 





including cases in which one brother was an infant. 
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The second diagram (p. 240) 


worked out by Mr Yule’s pseudo-ranks indicates also the heredity within the 
brown group when blues are excluded, but it obviously distorts the whole system 
and gives the spurious appearance of far less correlation. 


Another manner of illustrating the idle character of Mr Yule’s assertion that 


the true parental correlation is about 4 
of the following tables : 


may be obtained from a comparison 





5+6 | 7+8 | Totals | 








A |1t42] 8 | 4 
142 | 162 | 87/ 48 

|} $ | 102] 78] 52 
a 41 | 37] 28 
| 546 27 | 28 | 22 


7+8 26 34 | 30 


16 | 22 335 | 
191° 3s 284 | 








| Totals | 358 | 264 | 180 


1 | 90] 137 | 
9 | 19 | 105 | 
14] 35 139 | 


























| B foes j | 5+6 | 7+8] Totals 
1+2 | 19 | 70 | 41/ 9 | 21 | 335 
} s3 | 194 | 41 | 13 | 23] 284 
l 2 | 34] 65 /| 111] 18 137 
5+6 27 | 12 19 | 24 | 23 105 
7+8 | 29 | a4 | 24] 12 | 50] 139 

Totals | 358 | 264 | 180 | 69 | 129 | 1000 | 
EEE ee -_ — = on — a5 auseanenil — 

C 1+2 | BS J 4 | +6 7+8 Totals | 

| | 

1+2 | 193 | 84 | 38 | 10 | 10] 335 | 
3 100 | 85 | 54 | 19 | 26] 984 

J 34 | 39 | 31 | 12 | 21 137 
5+6 18 28 | 2 | 1i 23 105 


| T+8 | 13 | 28) 32 





358 | 264 | 180 





| Totals 








Now the changes that will convert A into B are given by the scheme: 





| 


| 
y 5+6 | 74+8 








B-A 14+2 3 | 
ja8 | om bem) Se 9 2 Se 
3 -19 +46 —ii a -10 
4 | —-16 ae +27 0 ae 
BEC.) 0 -16 = 2 +15 | +4 
aS epee Ae -10 = 6. |, = So) as | 

| | 


That is to say to obtain B from A we must take between } and } of the 
material of A and transfer it to the diagonal cells from those cells away from the 
diagonal, the only failure of this rule is the +3 in the cell 1+2:7+8. There 





31—2 
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can be no doubt that the correlation of Bis very much greater than the correlation 
of A. To convert C into A the changes are given by the scheme: 














| B-C | 142 | $ | 4 | 646 | 748 
| | 

| | ! | | 

|} 242 | +1 |] -14 |) +3] -1 |] +n 

wee ar Leet ee oe ee 

-9 | —-5 | +424 -1j)|-9 

| 5+6 +9 | -16 | -6 | +13 | 0 

| +8 +16 | - 4 ~ § ee +1 


Here again the candid reader will, we believe, admit that B has more correla- 
tion than C. The general difference is again a transfer to the diagonal cells of 
frequency to the right and left of this diagonal, the total number so transferred is 
78; but the transfer is rendered more complex by the appearance of 36 units in 
the cells in the top right-hand and bottom left-hand corners. This transfer is 
only half the size of the other towards the diagonal, and it probably measures 
the frequency of senile blues in the parents and of infantile blues in the offspring. 
We hold that we must still credit B with more correlation than C, although this 
secondary movement can now be well recognised. JB is the contingency table for 
Father and Son’s eye-colour (see p. 186). A is a table for a Gaussian surface with 
‘3 correlation, C is a similar table for a Gaussian surface with ‘5 correlation, both 
being adjusted to give frequencies to unit places only with the same marginal 
totals as the Father and Son eye-colour table. These comparisons suffice to 
indicate that the correlation of Father and Son is far greater than ‘30 and is 
probably slightly greater than ‘50. 





Precisely similar treatment of the Brother-Brother eye-colour table leads to 
like conclusions, as the reader may judge from what follows. 


Consider the three tables : 

















A 1+2)| 8 | 4 | 54+6| 748] Totals 

| | 

| | } 
| 142 | 398 | 283 | 113 | 66 | 71 931 
aay 283 | 281 | 135 | 87 | 115 901 
| 4 113 | 135 72 | 50 | 73 443 
5+6 66 | 87 | 50 | 36 | 58 297 
7+8 71) 115 | 73 | 68 | 111 428 
Totals | 931 | 901 | 443 | 297 | 428 | 3000 























B 14+2 3 4 | 646 ¥+8 | Totals | 
142 | 496 | 224 63 | 68 80 931 | 
3 294 | 418 97 | 56 | 106 901 | 
5 63 97 | 168 47 68 443 | 
546 68 56 47 70 56 297 | 
Y48 80 | 106 68 | 56 | 118 428 
Totals | 931 | 901 | 443 | 297 | 4298 | 3000 | 
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| C 1+2| 8 | 4 | 5+6 | 7+8 | Totals 

| 142 Jami_| 27s | 94 | 48 | 40 | 931 
3 | 278°] 300 | 140 | 86 | 97 901 
5 94 |140 | 78 | 55 | 76 | 443 

5+6 | 48 | 86 | 55 | 42 | 66 297 

| 7+8 | 40 | 97 | 76 | 66 | 149 | 498 

Totals | 931 | 901 | 443 | 297 | 428 | 3000 | 











Now mark what change must 
given by the scheme: 


be made in A to correct it to B; this is 








B-A 1+2 3 J 5+6 | 7+8 
| } | 
| } | 
| 142 +98 | — 59 | -50 +2 | +49 
| 8 -59 | +137 | —38 -3 -9 
me -- 50 — 38 +96 — $ -5 
5+6 + 2 - 3l - 3 +34 —2 
1+8 + 9 - 9 — 5 - 2 +7 





Can there be any doubt that the scheme B—A marks immensely increased 
correlation? To pass from A to B, we must accumulate 372 individuals along the 
diagonal as against 11 individuals drawn towards the corners away from this 
diagonal. We take it that only the most captious person could possibly deny 
that to reach B from A, we must transfer individuals in a manner which markedly 


increases the correlation. 


Now consider the change which must be made in @ to obtain B. It is given 


by the scheme : 


4 +6 7+8 
-3il +20 +40 
— 43 —30 + 9 
+90 - 8 —- 8 
- 8 +28 —10 
- 8 -10 -31 


|} 1+2 +25 — 54 
| 3 —54 +118 
4 —3l — 43 
5+6 +20 — 30 
7+8 +40 + 9 


To pass from C to B we must transfer 261 individuals to the diagonal; but 
there is an outward movement of 69 to each corner and a resulting defect of —31 
in the fifth diagonal cell. There is a total movement of 261 individuals towards 
greater, and of 169 towards lesser correlation. We should have no hesitation in 
saying that the correlation of B is higher than that of the C table. We would go 
further and say that until we find a table D in which these movements towards 
the diagonal and towards the outer corners nearly balance we have not yet reached 


a table with a correlation equal to that of B. Consider the table: 
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D 1+2 | 3 4 5+6 | 7+8 | Totals 
1+2 | 494 | 275 88 42 | 32 931 
3 975 | 308 | 142 86 90 901 

4 ss | 142 81 56 76 443 
546 42 | 86 56 44 69 297 
1+8 32 | 90 76 69 | 161 428 
Totals | 931 | 901 | 443 | 297 | 428 | 3000 

' | 




















B-D| 142 | 8 j 5+6 | 7+8 | 
| 
ee | +9 | -a-] ~@ | wee | +8 
3 -51 | +110 | -45 | -30 | +16 
5 am iw | 467 | =-6 | =8 
646 | 498 |- 30 | -9 | +296 | —123 | 
7+8 | +48 | +16 | -8 | -18 | -43 | 





Here 225 individuals must be transferred to the diagonal to get from D to B, 
i.e. there is an increase to this extent of the correlation, but 223 have got to be 
transferred outwards to the ends of the other diagonal, although not in such a 
concentrated fashion. There can be small doubt, we think, that the correlation of 
B is nearly, if not slightly greater than, that of D. B is the correlation table for 
Brothers, A, C and D have correlations respectively of ‘28, °45 and ‘50. We feel 
confident in asserting that the unknown correlation of B is far nearer to that of 
C or D, than of A, which Mr Yule gives as its limit! Nor can the reader who 
examines these tables fail to obtain true insight into the peculiar nature of the 
correlation of B. It is clear that B fails of normality because there is an excess of 
dark brothers having brothers with blue eyes. The excess is greater than in the 
ease of father and son and this is actually what we should expect, if the excess be 
due to the inclusion in the record of infants. For an infant will be reckoned 
once as a son, but if he is one of n brothers, he will appear n(n—1) times. If 
our surmise be correct there may be no failure at all of approximately Gaussian 
frequency in these eye-colour tables, but anomalous lumps in the second and 
fourth quadrants due to this unconscious inclusion of infants. If these lumps, 
which the inclusion of a dozen to twenty infants would suffice to produce, be 
removed, we should expect the correlation to rise as in the Huxley Lecture to 
about ‘60. What the final value would be after the correction for the more 
gradual changes in eye-colour, which occur after infancy*, remains to be deter- 
mined; we may be quite certain, however, that it is likely to be 80 to 100°/, 
above Mr Yule’s maximum of °28. 


* The influence of age on parental and fraternal tables of eye-colour will be shortly considered 
de novo. 
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But let us look at the matter from other standpoints. We have the stature 
data for Father and Son arranged in the eye-colour groups. Let us arrange it 
in a fourfold table, precisely as we did the eye-colour in order to demonstrate 
Mendelism, i.e. 

Stature of Father. 








a 1+24+3 | 44+5+64+7+8 | Totals | 
So | #ee79. 3: 464 155 619 
2 445464748 158 223 381 | 
E | 
3 

~~ 

NM 








Totals whe 622 | 378 1000 





We may give these a class-name, say, Short and Tall—corresponding to Short 
Muzzle and Long Muzzle, or to “Over 30” and “Under 30” egg hens, ete., ete. 
We have at once a Mendelian table: 


Short Fathers Tall Fathers | Totals | 





Short Sons... 464 155 619 
Tall Sons ok 158 223 381 














Totals... 622 378 1000 


Now assume a discrete unit between Short and Tall and find the Boas-Yulean ¢. 

It is 
o = 335, 

and as its Mendelian value should be °333, the agreement is extraordinary! Tall- 
ness and shortness in man are clearly Mendelian presence and absence of a unit 
character! The eye-colour tables at the same divisions show exactly the same 
result. Surely Mr Yule is satisfied that the correlation of stature in man is of 
the order 4? Yet the tetrachoric r, in these cases at the same division gives for 
eye-colour 55 and for stature ‘51, i.e. is some 60 to 70°/, greater. Now let us 
treat these two cases by an identical process; we use a normal horizontal and a 
normal vertical scale, and we determine all ranges in terms of the range of the 
frequency of groups 3 + 4 treated as of length h. We find the mean values of the 
arrays of sons for each group of fathers by assuming that the means will be 
approximately given if the array be treated for this purpose as Gaussian (see 
p. 236). The results are given in the table on p. 249. 


Can any reader who examines these data deny the remarkable parallelism of the 
two cases? Let him also look at the Diagrams VIII and IX (p. 248) drawn for 
stature and eye-colour and ask himself whether it is possible to make any distinc- 
tion between the cases of inheritance of eye-colour and inheritance of stature, 
beyond the irregularity depending on the group of 36 cases (3°6 °/, of the whole) 
of dark blue-eyed fathers. The two regression lines run with almost complete 
agreement, the stature data being a little more regular than the eye-colour data. 
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colour groups. 


Regression in Stature. 


Father and Son. 
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Dracram IX. 


Regression in Eye-colour. 


Father and Son. 





Based upon means obtained by Gaussian 
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Mean Son | Eye-Colour Stature 
Group 1 — *4336/ — *8255A* | 
Cee — 5065 | — -4778h 
eae — 0166h | — -0246h 
e 5 + *1319/ + *2686h | 
» Oo+6 + ‘6704h + ‘B611h 
> * + *7513A + ‘7353h 
is 8 +1°1414h +1°0455/% | 
| 
| Rees 
Standard Deviation o, me: Ree Vitece 8247h | *8247h 
| ” ” Os ie ete we *8932h *8932A 
| » uncorrected from above ... nee Rt 5217 5074 | 
n corrected se = EH oa es 5420 5224 
Slope of Regression Line, from n used as r ... 5870+ 5658 
True value of r 33 ee Bee He ? ‘5189 | 
Tetrachoric r,t Sor | pee Le = eae 5503 5104 | 


Can any one continue to assert with Mr Yule that the correlation for eye-colour is 
about $ and that for stature is}? Is it not clear that for eye-colour in Father 
and Son the value given by Pearson in 1900 is within ‘02 of the true value ? 


A similar figure (Diagram X) is given on p. 250 for Mother and Son. This is more 
irregular in the terminal groups (which contain 3°6 °/, and 4°6°/, of the frequency 
only), but it shows the same points. The uncorrected » =°4930, the corrected 
7» = ‘5109 and the regression is ‘5459. These are in quite good agreement with 
the value of r as found from contingency, i.e. “4885 (see p. 241), but differ slightly 
in excess from the value given by the tetrachoric r, for the “Mendelian” table, 
i.e. correlation ‘4817 and regression 5145. It is clear from the diagram that r, 
as regards 7, has been lessened by the deviations from linearity in the terminal 
classes§. Still the slope of the regression line shows that we are far from dealing 
with a correlation of 4 for there is approximate equality in the variations of the 
eye-colour in mother and son, o), ='8574h and o,=‘9161 h. 


(12) The Vaccination Data. 


We now turn to the question of vaccination and recovery from small-pox, 
Mr Yule apparently considers either his coefficient of association, or his coefficient of 
colligation, to be the right coefficient to use here. Now both these coefficients are 


* The means in the case of the two terminal arrays had to be somewhat differently treated as there 
is no frequency in the first array of groups 5—8 and in the last array of groups 1+2. Accordingly if the 
h of the table be hg, 4, the mean of the first array was found in terms of he, i.e. the range of group 2 
for that array, and the mean of the last array from h;,¢,7 for that array, i.e. the range of groups 5+6+7 
for that array. hy and hs,¢,; were then expressed in terms of hg,4, i.e. of h of the table by means of 
the relations between these subranges in the marginal total frequencies for sons. 

+ To four figures the value as given in 1900 was °5159. 

t For divisions as originally given in the Phil. Trans. memoir, Vol. 195 A, p. 106. 

§ See remarks, p. 164 above, on the true measure of correlation. 
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unchanged by the artificial selection in which Mr Yule sees* “a most important 
property and one of special importance in such cases as those I have chosen for 
illustrations,” i.e. vaccination and recovery. On the other hand the ¢, or pseudo- 
rank correlation, is for a fourfold table immensely influenced, as we have shown, by 
selection of this artificial kind. If this ¢, however, be artificially selected, so that 
it deviates widely from the ¢$ of the data actually provided, then this artificial 
value of ¢—for a table, say, in which the number of vaccinated has been made 
equal to the number of unvaccinated and the number of deaths equal to the 
number of recoveries—is Mr Yule’s coefficient of colligation. From our stand- 
point it is hard to conceive a stronger argument against this coefficient of 
colligation than the fact that it is unequal to ¢, unless you have artificially 


Diagram X. Regression in Eye-colour. Mother and Son. Based upon means obtained 
by Gaussian assumptions. 
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doctored @ to bring the two into agreement. What is the probable error of 
¢@ thus doctored or how is it related to the probable error of @ found from 
the undoctored material? Mr Yule, however, writes: “We have therefore the 
important theorem briefly mentioned without proof in p. 17—the coefficient of 
colligation w for any table is the product-sum correlation r+ for the equivalent 
symmetrical table. These two coefficients r and @ form, accordingly, a natural 
pair, the first giving the actual correlation in the given table, the second the 
correlation in a derived table of standard form, thus enabling us to compare the 


* Loe, cit. p. 587. 
+ Mr Yule here as elsewhere terms ¢ the ‘‘ product-sum correlation” and uses for it the letter r. 


This is wholly unjustifiable, it is merely a correlation of pseudo-ranks and not of true variates 
at all. 
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two tables freed from the effects of ‘selecting varying proportions of A’s and B’s’” 
(loc. cit. p. 597). The italics are Mr Yule’s. We have rarely come across more 
specious reasoning. A coefficient is selected which for one type of artificial 
selection is not changed, and this peculiarity is termed a special and important 
feature of the coefficient. Another coefficient which is intensely subject to this 
selection is then commended because it can be selected so as to agree with the 
first; and Mr Yule terms them a “natural pair.” Algebraically Mr Yule starts 


with the table at and the Q for this is (ad — be)/(ad + bc). This value of Q 

Jat | 
vibe | Vad 
quencies have now been rendered artificially all the same. The original @ however 
was 


remains unchanged if we write the table where the marginal fre- 


(ad — be)/V(a + b) (a + c) (b +d) (c + d), 
and has changed to 


(Vad —Vbc)/(Vad + Vbc), 


which is Mr Yule’s colligation and a function of Q. The original @ has many 
important properties, it is Pearson’s mean square contingency for a fourfold table 
and determines the probability of the two variates being independent; it is also 
the correlation of means if they be measured from their dividing lines in terms of 
their standard deviations, supposing the material continuous and Approximately 
normal, The new @¢ possesses neither of these really important properties, it is 
the ¢ of the table after most artificial selection, and that it agrees with Mr Yule’s 
coefficient of colligation neither gives validity to that coefficient, nor endows it 
with any new property whatever. 


Mr Yule has not taken the trouble to see what sort of effect his artificial 
selection really has on actual material. We propose to illustrate it on certain 
vaccination data. We give, as Table XXI, the table of Glasgow data published in 
Biometrika, Vol. vii. p. 257. It is not true that the haemorrhagic and confluent 
cases all die, but a very large percentage of them die and it would be not far 
from the fact to represent the data by a fourfold table: 








| Deaths | Recoveries Totals 
| 
Vaccinated ... 273 1301 1574 
Unvaccinated ... 65 | 50 115 . 
Totals ett 338 1351 1689 








This table is only illustrative, not, of course, a rigorous experience. 


Now we have not assumed Table X XI to be a Gaussian distribution, but we have 
found the means of the arrays by assuming each to be represented by a Gaussian 
curve, a process which, as we have seen, gives a fairly close result even for skew 


material. We have also, in order to get scales of a reasonable character, arranged 
32—2 








Dracram XI. 


Intensity of vaccination measured in period since vaccination. 


Unvac- 
cinated 


TABLE XXI. 


On Theories of Association 


City of Glasgow Hospital. 


Severity of Attack. 








Severity of Attack and Strength of Immunity due to 
Vaccination 











S | Haemerrhagic | Confluent | Abundant Sparse | Very Sparse] Totals 

| 

3 

aol o-20 0 | 1 6 un | 12 30 
S's | 10-25 5 ae. 114 165 | 136 457 
Ss | 25-45 29 | 155 299 268 | 181 932 
RF = Over 45 11 35 48 33 28 155 
°§ Unvaccinated 4 | 61 41 7 | 2 115 
3° | 

& Totals 49 289 508 | 484 | 359 1689 | 














Regression of the intensity of vaccination upon severity of attack and of severity of attack upon 


intensity of vaccination, showing the continuity of the relationship. Measured on Gaussian scales. 
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the severity of attack and immunity due to vaccination on Gaussian bases. This 
is done in Diagram XI, and the corresponding regression lines are drawn. These 
were calculated by supposing the range of the mid or ‘abundant’ group the same 
for all arrays of immunity, and the group 25—45 the same for all arrays of 
severity. It will be seen at once from this diagram that the division into 
vaccinated and unvaccinated is a merely verbal one, that Mr Yule is playing with 
words, not dealing with the realities under class-names*. Immunity and severity 
are continuously changing quantities and they vary in almost linear relationship to 
each other, the greater the intensity of the immunity associated with vaccination 
the less severe is the attack. We do not think the deviations from linearity 
of regression found in this case differ sensibly from the sort of values that 
frequently occur in the case of the regression line found by the product-moment 
method. 


In order, however, that Mr Yule may not attribute this result to the use of the 
Gaussian distribution to find the means, we applied his method of pseudo-ranks to 
determine the means of the arrays and then put in the regression lines on his 
hypothesis that the rank of a huge bracket is a suitable unit in which to measure 
correlation. . This is done in the lines A of Diagrams XII and XIII. Although 
the previous scheme is much distorted, we see that on Mr Yule’s own hypothesis 
the two variates behind the “vaccination—non-vaccination” and “death—recovery” 
classifications are continuous, steadily increasing in one direction, and that there 
is the correlation of a continuous two-variate system behind them. 


Now let us proceed to do exactly like what Mr Yule has done, namely make 
divisions at vaccinated and non-vaccinated, and between confluent and abundant, 
and dress the table so that there shall be equal numbers of vaccinated and un- 
vaccinated and of the very severe and less severe cases. The table thus transformed 
is Table XXII. The difficulty now is to know what to do with the marginal scales ; 
if these in the natural condition were normal, they are hardly likely to remain so 
after selection. But these scales—especially as we have found the correlations by 
the »-process—are not of first class importance except for exhibition of the results 
graphically. We have therefore retained the old scales and merely calculated the 
means of the arrays condensed at the old means of the old scale of years since 
vaccination (see Diagram XIV). In order that the reader may see how little 
assumption is thus made we have had a second diagram drawn indicating what 


* <*The division may also be vague and uncertain: sanity and insanity, sight and blindness, pass 
into each other by such fine gradations that judgments may differ as to the class in which a given 
individual should be entered. The possibility of uncertainties of this kind should always be borne in 
mind in considering statistics of attributes: whatever the nature of the classification, however, natural 
or artificial, definite or uncertain, the final judgment must be decisive; any one object or individual must 
be held to possess the given attribute or not.” Theory of Statistics, pp. 8,9. Similar words are used 
by Mr Yule in Biometrika, Vol. m1. p. 121: ‘*‘The judgment must however be finally decisive ; inter- 
mediates not being classed as such even when observed.’’ We can hardly conceive statements more 
liable to prejudice the mind of a wavering recorder of actual data. No wonder Mr Sanger said that 
‘“Mr Yule’s work would be the work for Mendelians” ! 
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TABLE XXII. 


Equalisation Process. 
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Immunity and Severity of Vaccination Table after a Yulean 



































| apa tom Haemorrhagic | Confluent | Abundant Sparse Very Sparse Totals 
| | 

0—10 0 0-9 2°8 ae 56 14°4 
10—25 4°4 32°8 52°38 76°4 63°0 229°4 844°5 | 
25—45 25°7 137°4 138°5 124°1 83°8 509°5 
Over 45 9°7 31°0 22°2 15:3 13:0 91°2 
| Unvaccinated 37°1 565°5 198°4 33°9 9°6 844°5 
} 
| | | 
| Totals 76°9 767°6 =| 414°7 254°8 175°0 1689°0 | 

| 
c ——_ Peer. =< fi as 
844°5 844°5 


Diacrams XII anv XIII. Intensity of Vaccination and Severity of Disease treated by the method of 
pseudo-ranks, 
(A) original data, (B) Haemorrhagic and Confluent made 50°/, 
and vaccinated made 50°/,. 
(i) Regression of intensity upon severity. (ii) Regression of severity upon intensity. 
Diacram XII. 
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Diacram XIII. 
(ii) 
0-10 
5 10-25 
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> 
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A\ ee 
\ 
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Severity of Attack. 


changes would have been made had we used as horizontal scale a Gaussian calculated 
on the new distribution of total severities. It will be seen on comparing Diagrams 
XIV and XV that no substantial difference is made in the fundamental result, 
Further to test the effect of concentration we have had 7» the correlation ratio 
calculated by concentration at the means of the unselected scale of years since 
vaccination and by treating each array as a Gaussian; the value found for it in 
the former case (when uncorrected for class-index correlation of severity) is ‘32, 
while in the latter case the three-rowed table gave 7 = ‘32 and ‘34, with a mean 
value ‘33*. Both these values would need the same class-index correction for 
severity, i.e. r,¢ =*9502, which renders them ‘34 and °36 respectively. 


Further, to show that a wide range of hypotheses can be made without 
modifying the fundamental conclusion, we have again plotted the regression lines 
on the Yulean hypothesis of pseudo-ranks before and after equalising deaths and 


* Biometrika, Vol. vit. p. 257. The value of r found by concentrating at the Gaussian means of the 
marginal subranges is °34, quite close to the 7 values as corrected. 
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recoveries, vaccinations and non-vaccinations (see Curves B of Diagrams XII and 
XIII). The fundamental result is obvicus in all these diagrams, the selection, 
which does not change Mr Yule’s coefficients, changes widely the true relationship 
as measured by the regression lines between the variates immunity conferred by 
vaccination and the severity of small-pox. Take a given grade of severity of the 
disease, and the grade of immunity which is associated with it is entirely altered by 
the selection, or take a definite grade of vaccination and the average severity of the 
disease is represented by quite different figures before and after selection. Mr Yule’s 
most important property of his coefficients—the property that they remain 
unaltered by this selection—is the very property which forces us to the conviction 
that they are wholly unsuited to use in such a case as that which he asserts is 
eminently fitted for their use. 


If we consider the matter algebraically, we find: 


| | | | 
: : ; Yulean ¢ | Contingency | 
| Uncorrected! Corrected | Uncorrected! Corrected 











Before selection | 32 to 34 “34 to “36 26 34 “| 

| After selection | 58 ‘61 “52 60 “50 58 

| ; uy 
We contend that a method which so substantially changes the real relationship 
between immunity and severity is wholly incapable of leading those who use it to 
any just inference as to the association of immunity and severity. It is only a 
logical quibble quite unworthy of the reputation of the man who uses it, and 
directed at an audience which had made no thorough study of the mathematics 
of statistics, to state as Mr Yule does that a man dies or does not die after 
incurring small-pox, and that he is vaccinated or not vaccinated. Death and 
vaccination are crude class-indices of severity and immunity, and Mr Yule’s 
coefficients tell us nothing of the real relation between the two variates which 
is what we at least are seeking. 


(13) On the Stability of Coefficients of Association. 

If we have, as we believe we certainly have, in this paper succeeded in 
demonstrating the idleness of Mr Yule’s coefficients of association and colligation 
for any purpose of applied statistics; if, as we hold, they are merely of interest 
from the standpoint of symbolic logic, i.e. in the discussion of verbal classifications, 
not in the treatment of the real things lying behind class-indices ; the reader may 
naturally ask how we propose to treat similar problems. The answer to our mind 
is fairly clear. When the variates truly advance by unit grades, then there is no 
difficulty whatever about the problem, The right method to use is the product- 


* Caleulated by concentrating the contents of each cell at the Gaussian centroids of the subranges 
of the marginal totals and correcting for class-index correlations ; see Biometrika, Vol. 1x. p. 119. 
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moment correlation. This method has been used from the “earliest times” by 
biometricians, e.g. in dealing with the teeth on the carapace of prawns, the 
prickles on holly leaves, stigmatic bands on poppies, veins on leaves, etc., etc. 
There is absolutely nothing new in the Yulean method of pseudo-ranks applied to 
these data, the rank is the true measure of the variate in such cases. But the 
method is wholly false when applied to continuous variates in such cases as 
Mr Yule applies it. It is wholly false in particular when applied to fourfold 
tables, unless the difference between the two classes is a measurable unit. This, 
as we have indicated, is not the case in the Mendelian results of actual practice ; 
it is not the case in the vaccination data or in any one of the cases to which 
Mr Yule has applied his coefficients. In no cases is he dealing with discrete unit 
differences. The nearest approach to a discrete difference is possibly in the case 
Diagrams XIV anp XV. Regression of Intensity of Vaccination upon Severity of Small-pox. 
(A) Original data. (B) Vaccinated made 50°/, and haemorrhagic and confluent 50°/, 


(Yule’s Hypothesis). Intensity of Vaccination and Severity of Attack treated by Gaussian 
Methods. 


(i) Assuming ‘‘severity” has a Gaussian distribution after the change of frequencies. 
(ii) Assuming the centroids of each severity group unaltered by the change of frequencies. 


Diagram XIV. 
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Dracram XV. 


(ii) 
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of sex, or in the use of a spear or a sword in negro tribes as alternatives, although 
even in these cases itis hard to understand why the male is a unit in excess or 
defect of the female or why a spear is a sword plus or minus a unit. But if we 
admit this, then ¢ is the appropriate coefficient to use, and not Mr Yule’s 
association or colligation, and it should be applied to the original data and not 
to the adjusted or equalised table. Personally we should never use ¢ in such 
cases; we should measure the probability that the variates were independent, 
ie. deduce the P from nd? by Elderton’s tables, and this would guide our judg- 
ment in the matter*. 


* We can get rid of the main effect of influence of total number of cases considered, and of the 
inequality of the marginal grouping, by adopting the method proposed by Pearson and thinking on a 
correlation scale: see ‘‘On a Novel Method of regarding the Association of two Variates classed solely 
in Alternate Categories,” Drapers’ Company Research Memoirs, Biometric Series VIII, Dulau & Co. 1912. 
In that paper the values of the correlation on the probability scale rp were only tested against tetra- 
choric 7, for such totals and divisions as occur most frequently in everyday practice. A caution must be 
given here against the extension of that method without fuller investigation of such cases to divisions 
giving small percentages in the marginal frequencies. For such cases we know that tetrachoric r, gives 
poor results, 
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But the cases of approach to discrete differences are exceedingly few in number 
compared to the mass of cases with which we have to deal; they exist as a rule 
only in the class-names, not ip the things classified. Personally we have rarely 
found them except as already stated in theoretical Mendelian investigations, and 
to such cases the method of ranks, i.e. use of a ranks coefficient whether for four- 
fold or manifold tables, was first and rightly applied by Pearson. The Biometric 
School criticise not the application to Mendelian theory of these methods by Dr 
Brownlee and others, in which they only followed what had already been done, but 
Dr Brownlee’s applying tetrachoric r, to such theoretical tables and then supposing 
that he had got at the root of the difference between theoretical and act al heredity 
correlation tables! Mr Yule nowhere distinguishes between a theoretical Mendelian 
table and what we can absolutely demonstrate not to be theoretical Mendelism, 
e.g. Pearson’s pigmentation tables. He writes: “As regards Dr Snow’s recent 
comments in Biometrika on the use of the normal coefficient for Mendelian 
tables in Dr Brownlee’s paper, he really thought that those comments were a 
much stronger condemnation of Professor Pearson’s than of Dr Brownlee’s work. 
Professor Pearson had repeatedly used the normal coefficient for inheritance 
tables, that were in all probability representations of Mendelian inheritance, as 
if it were an approximation to the product-sum correlation” (loc. cit. p. 651). 
It is a pity Mr Yule has not the courage of his opinions, and did not assert that 
the eye-colour data had a correlation of the Mendelian value } because they were 
Mendelian. Instead of that he throws the sop of } to the Mendelian Cerberus, 
having carefully produced it not from the proper fourfold Mendelian table, but by 
applying the method of pseudo-ranks, which involves just the same assumption 
of continuous variation and regression beyond the Mendelian divisions into unit 
characters, as is involved in the tetrachoric 7,, What Mr Yule means by “in all 
probability representations of Mendelian inheritance” is eloquently demonstrated 
by our discussion of the case—eye-colour in father and son—which he has himself 
selected to illustrate the Mendelian }. 


If we leave the Mendelian theoretical table and any other truly discrete 
fourfold classifications, which are indeed difficult to find, and pass to the fourfold 
classification in general, what method are we to use? We assert that the four- 
fold or tetrachoric 1, is a coefficient of association infinitely superior to Mr Yule’s 
old Q or new @. 


Mr Yule says he attaches no importance to the fluctuations of Q and @ for 
different divisions of the same table. If so, how can he usefully compare the 
values of Q or found from two tables with different divisions? If so, on 
what ground can he complain of the use of tetrachoric r, because in certain 
cases it fluctuates for different divisions of the same table? He admits that 
“the coefficients of association and of colligation for different divisions of the 
same table in many cases fluctuate more largely than the normal coefficient” 
(loc. cit. p. 651), yet the only illustration he gives in his paper is that of the 


33—2 
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table for eye-colour in brothers, for which he discusses the relative fluctuations 
of his coefficients and tetrachoric r, by a method which from the standpoint of 
scientific statistics is wholly inadequate. He states that “as soon as we leave the 
narrow field within which normal or ‘strained normal’ correlation holds good, the 
normal coefficient fluctuates as we change the axes of division quite as largely 
as any other coefficient” (loc. cit. p. 633). How does this “narrow field” tally 
with the “many cases” in which these other coefficients fluctuate more largely ? 
Mr Yule has purposely selected three or four markedly skew distributions to 
show how the tetrachoric r, fluctuates; why did he not test adequately and 
completely its fluctuation as against those of the coefficient of association on 
this material? Would he ever have written his above dogmatic assertion had 
he done so? 


Now we do not for a moment agree with Mr Yule that there is no importance 
in the fluctuations of a coefficient of association for different divisions of the same 
surface. On the contrary we assert that a true coefficient of association should 
be as stable as possible, that is to say that for a given surface it should have 
fluctuations which are within a reasonable range indicated say by twice its probable 
error. Indeed, for most practical purposes fluctuations of ‘05 are of small im- 
portance. It is purely idle to do as Mr Yule has done, namely proceed to test 
such fluctuations by their ranges or their standard deviations obtained from the 
raw values. Each value must be considered in conjunction with its probable 
error. Q=1 and r=‘70 are not comparable if the probable error of the first is 
zero and of the second ‘05, the weight of the first observation is infinite and of 
the second finite. Mr Yule has compared such results without any regard to their 
relative weights. 


Starting with tetrachoric r; we have a definite surface, the Gaussian, either in 
its original or strained condition, for which there are no fluctuations in 7, except 
such as might arise from random sampling. No such surface has been discussed 
by Mr Yule for Q or ¢ But is an approximation to the Gaussian surface a 
rarity? Does it only occur in “a narrow field”? On the contrary it covers 
within the approximations required in practice a very wide field, namely nearly 
all distributions in anthropometry and many characters in plant, insect and 
animal forms. Even the irregularities of the eye-colour data may well turn out 
to be due to neglect of the age-corrections and not to failure of the Gaussian 
system. Can Mr Yule produce, even with careful seeking, actual distributions, 
in number one-tenth of the Gaussian cases just referred to, for which Q and 
therefore w are practically constant for all divisions? There is no reasonable 
and logically consistent theory of deviations which leads to a surface of constant Q, 
although there is such a theory leading to a surface of constant 7,, We hold that 
stability of an association coefficient is not only a desirable, but an essential part 
of any true theory of association, and the fact that one theory does give a relatively 
stable coefficient for a large section of material is immensely in its favour. 
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But there are two other points about tetrachoric 7, which also in our opinion 
weigh much on its side when comparison is made with Q: 


(i) It is modified by every form of selection and thus corresponds to our 
experieuce of every true measure of relationship. If association is to be of profit, 
it should pass into correlation as our detailed knowledge of the material becomes 
greater. All Mr Yule’s association achieves is the passage with increasing 
knowledge into manifold and contradictory diversity. Every selection he has 
made on the basis of lesser knowledge must become an increased source of 
contradictory values, as he reaches more detailed knowledge of his material. 
There is a systematic variation of Q; it rises continuously from median to 
extreme divisions in every distribution on which we have tested it. That is 
to say, there is a wide divergence of practical statistical data from the surface 
of constant Q, and this in a definite direction. Nor is this to be wondered at, 
for the surface of constant Q has marked heteroscedasticity, and markedly curved 
regression: see Appendix ITI. 


(ii) A true coefficient of association should not necessarily become perfect, 
when one of the four quadrants of the fourfold table becomes zero*. Whatever 
the degree of dependence of two variates may be, the frequency surface in practice 
is limited in extent, therefore by taking small percentages of one or both variates 
in the marginal totals we can always in practice make one quadrant zero, We 
hold indeed that a fitting coefficient of association need not necessarily be perfect, 
even if two opposite quadrants have zero frequencies. 


In both of these respects tetrachoric 7 is superior to the coefficients of 
association or colligation. The one aspect in which it is inferior is the labour 
of calculation, but the tables published by Everitt and an additional table which 
we hope shortly will be published render the labour by no means severe. 


In order to establish our position it remains to show how even in the ex- 
ceptional cases selected by Mr Yule—which do not represent the run of ordinary 
practice—the tetrachoric 7, is much more stable than his Q. 


* Consider the following Table for true r=*52: 


Father’s Stature. 























2 Under 71°5 Over 71°5 Totals | 

s | ‘ 
S| Under 65-5 1275 | 0 127°5 

2! Over 655... 863°5 87 950°5 

2 | 

°o 

MN | Totals ae 991 87 1078 








What real knowledge do we gain by saying that association is perfect between ‘‘ tall” sons and 
‘‘tall” fathers, when it arises solely from the extreme position of the father-division not giving any 
content to a second quadrant on a limited surface of imperfect correlation ? 

+ See p. 177 and later Appendix I. 
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(A) Barometric Data. 


On Theories of Association 


The following table gives the value of tetrachoric r, and of the 
association Q*. 


Southampton and Laudale. 


coefficient of 























| Divisions Tetrachoric r, Pare Divisions Tetrachoric r; Po okt 

| 29°15 “6968 + °0378 *9371 + °0144 29°95 "7991 + ‘0096 *8868 + “0075 

1° 29°25 °7254 + 0291 9335 + ‘0124 30°05 °7954 + ‘0100 *8831 + ‘0075 
29°35 *7476 + '0220 "9244 + 0108 30°15 ‘7971+ 0110 8867 + ‘0075 

| 29°45 *7410 + 0215 ‘9067 + ‘0109 30°25 *8112+ ‘0127 ‘9109 + ‘0070 

| 29°55 *7303 + ‘0177 *8847 + ‘0111 80°35 *7983 + 0149 9229 + ‘0076 

| 29°65 *7559 + ‘0140 8866 + °0097 80°45 *8300 + ‘0176 *9579 + ‘0061 
29°75 *7848 + ‘0114 *8951 + ‘0083 30°55 8452 + 0231 ‘9761+ 0050 | 
29°85 ‘7917 + 0102 ‘8874 + 0079 - — _ | 


* We have not thought it needful to deal in every case with both Mr Yule’s coefficients. Since w is 
a function of Q, and its probable error is a function of the probable error of Q, there is no occasion to 
do so, and it opens a whole field for the display of statistical fallacy. Consider two quantities r and p 
defined in their mutual relations by r=(1--)/(1+¢) and p=(1—e!/")/(1+¢1/”); then if we arrange a 
fourfold table so that r is positive, e« will lie between 0 and 1 and p between 0 and 1. 
e=(L-r)/(L+r),  el/™=(1-p)/(1+9), 
or taking logarithmic differentials we find 


Clearly 


g./e=20,/(1—r?) and ~ ¢,[e=26,)(1-p%). 


1 

6, Cy a 
Thus it follows that i ae 1p 
p r ni 


Now if n be positive ris always greater than p; it follows accordingly that 1/p—p is greater than 
l/r—r, but if we have any given series we can choose the value of n so great that o,Ip is less than 


: o,/p 4. O* 1-8 
For if w= , then u=— —__ —_* 


o,|T. o> |r ny" eé 


As n becomes large, then the limit of 


1 el/n 1 


a ee me 
n1-e/n 2n 


Limit of u me. zy 


, 
2ne 


and therefore 
which can be made as small as we please. 


Thus if we have any system of tetrachoric 7,’s, it is always possible to devise a new system of p’s, 
obtainable by the relations 


e=(1-r)((l+r) and p=(1-e!/")j(1+e”), 
which have op less than o,/r. Thus, if we generalise Mr Yule, Q=(1-«!/)/(1+«'/”) satisfies all his 
conditions of a coefficient of association, but we can always select n so that o,/Q shall be less than 
gg/Q or ¢,,/a, where Q and w are Mr Yule’s coefficients. Similarly we can deduce from the tetra- 
choric 7; a definite function of it p, which has a ratio o |p less than o,/r, and yet satisfies all the 


so-called conditions of association. But for Mr Yule, and many others, the ratio of a quantity to 
its probable error is a measure of its significance. Hence by a proper choice of n we can always 
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It will be seen that the values we have obtained are not wholly in agreement 
even to the second place of figures with the values obtained by Mr Yule, who has 
selected only 8 out of the 15 diagonal cases. Diagrams XVI and XVII show that 


Diacram XVI. Correlation of Barometer at Laudale and Southampton. Actual values 
of tetrachouic r;. 








mean, 


























=2°5 times probable errors from the 





Value of tetrachoric 14; hatched area 





29°15 29°25 2935 29°45 29°55 29°65 29°75 29°85 29:95 30:05 3015 3025 30°35 30°45 30°55 
Height at which division was taken. 


Dracram XVII. Mr Yule’s representation of tetrachoric 7. 
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Height at which division was taken. 


deduce from one of Mr Yule’s type of coefficients a second which has greater or less or no significance 


as the case may be. Further: 
_11-p? 


o.= 5 O,=UC,. 
eP nil’ — 


el/n (1+e)? 


where u= oe 2 , but as n increases this tends to take the limit : 
n (l+e%")? 


1 (1+x)? : 
ae te 

or can be made less th«n unity. In other words, given 7 or Q, we can always choose a new quantity p 
or 2, which has a smaller absolute probable error. If colligation for a given series has a smaller 
probable error than Q or a smaller one than 7;, it is always possible to choose new functions p or Q, 
which are absolutely determined by 7, or by Q and w, which have still smaller probable errors, and 
yet are true Yulean coefficients of association, ranging between 0 and 1! The fact is that the term 
‘‘ probable error” has no meaning for these coefficients until Mr Yule has discussed the nature of their 
frequency distribution, which is certainly not Gaussian. In the text we have compared r; and Q with 
their probable errors because it probably gives some rough estimate of their relative stability, and it is 
the test Mr Yule has himself chosen, 
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Mr Yule’s selection has been singularly favourable to his unjustified assumption 
that the points lie along a straight line; actually the points are extremely steady 
between 29°85 and 30°35, and tail off to the terminals, where the percentages in the 
smallest quadrant are much smaller. However, apart from the question of linearity 
of the distribution of 7, there is little doubt that the values are in defect for low 
divisions and in excess for high divisions. The question therefore is, are these 
defects and excesses such as to invalidate the use of the method of tetrachoric 
functions applied even to variates as skew as the barometer data? The only method 
of answering this question appears to us to consider the relation of the values found 
to their probable errors, and again the amount of stress which is likely in practice to 
be laid on the result deduced from a single fourfold table. We believe that practi- 
cally it is not necessary for the type of reasoning based on a fourfold table that the 
correlation found should be nearer than +05. In the present case three coefficients 
exceed these limits, but if we proceed solely to two figures the first and last only 
lie outside these limits. Diagram XVI shows this result by the lines AA’ and 
BB’. The hatched part of the diagram corresponds to the region on either side 
the product-moment value of 2°5 times the probable error. Actually some 10°/, 
of the observations should lie outside these limits, i.e. 14 observations instead 
of 3. But even these three have almost contact with the hatched area. It seems 
to us that in this very case of barometer data, which Mr Yule has chosen for 
its marked skewness to discredit tetrachoric 7,, the coefficient defeats Mr Yule 
completely ! 


Now let us compare the results with Mr Yule’s own coefficient of association. 
The difficulty in the comparison lies with the standard value of Q against which 
the other Q’s are to be compared. Had Mr Yule studied the surface of constant Q 
—what we may term the association-surface—(see p. 184 above), then the Q corre- 
sponding to the best fitting association surface would have formed a standard Q. 
But we have at present no means of finding this standard Q, and Mr Yule tells us 
that he himself lays no stress on the diversity of values obtained for Q with different 
divisions. However, Mr Yule bas himself taken the mean Q as a standard when 
he comes in a special case to deal with the relative stability of tetrachoric r, and 
association Q. Accordingly we follow him in taking a mean Q. But he has gone 
astray in simply taking the arithmetical average of his Q’s or r,s without regard 
to their probable errors. The proper means to take are weiahted means, and the 
proper standard deviations are weighted standard deviatious. Each value must 
be weighted with the inverse square of its probable error as the measure of its 
grade of accuracy. With this weighting we find: 

Barometric Heights. 
| 


Weighted 


Weighted Mean Standard Deviation 


Association Q oe “9194 ‘0346 


| 
| 
Tetrachoric 7; is *7886 0243 
| 
| 








ata tn" oS ath 
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It will be seen that the mean weighted tetrachoric 7, differs only by ‘0086 from 
the true product-mon-ent value (‘7800). Further the variability of the association 
coefficient marks an increase of 42'4°/, on the variability of the tetrachoric 7;* ; 
there can be no doubt that in this first of Mr Yule’s selected markedly skew cases 
the tetrachoric coefficient is far more stable than the association coefficient. 
Another method of approaching the problem will also illustrate this point. Let 


us express the deviations from the means in the case of both coefficients in terms 


Diacram XVIII. Comparison of r; and Q for Correlation of Barometric Heights. @ has 
nine cases, r, only one case outside 2:5 times the probable error. 


Barometer Heights. Southampton and Laudale. 
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Percentage in smallest quadrant. 


of their probable errors, and then average these deviations: the mean value of 
(Q-— mean Q)/(probable error of Q)=3°35, the mean value of (r;— mean 7;)/ 
(probable error of r,)= 1°63. Clearly Q is 100°/, worse than 7. This is indicated 
on the accompanying Diagram XVIII, where the values of Q and 7 are plotted to 
the percentage frequency of the smallest quadrant of the table, a matter which 


* As both Q and r, have the same possible range of variation -1to +1, in such a case the standard- 
deviation and not the coefficient of variation properly measures the variability. 


Biometrika rx 34 
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we have found a safe guide to determine how far we introduce risk in using 
tetrachoric 7. It will be seen that the unsatisfactory values of 7, occur when this 
smallest quadrantal frequency is less than 2°5°/,. On the side of Q or 7, towards 
their mean, a line is drawn equal to 2°5 times the probable error; in all but one 
case of r, this reaches the mean; in nine cases of Q it does not, and the deviations 
in six of these cases are excessive. There can, we think, be no doubt that tetra- 
choric 7 is in this first of Mr Yule’s selected cases far more stable than his 
coefficient of association. 


(B) Skew Table of Number of Mendelian Couplets in Father and Son 
(see Table XVI on our p. 222). 
The following table gives tetrachoric r; and Q with their probable errors for 
a second series selected by Mr Yule. 


Vertical division taken between 


























0—1 T= a 8—h 
| | _— | |} : 
| O—1 | r,='39+°017 m= 3874017 | ™%='35+°030 7, = 33 + 086 
| | Q='49+ 018 Q='51 + 023 Y='63 + 048 Q="75 +153 
| | 
2 2: UEREAGSeAEeN nee SET 8S | Sete Paras 
1—2 | 1r,='37+°017 7,='40+°017 r= "39+°025 | 7,='37+°066 
Q="51+-023 | QY=52+-019 Q= 61 +°029 = "72 +087 
> | ne _ Ce BS SRS 
2—3 | r= '35+°030 r= 39 + 025 %='41+'0383 | %='40+'073 
| @=63+ 048 Q="61 +029 Q="69 + 034 Q='79 +068 
Lee) oe Ste Fs had, SSeS ieee oe ae 
| 3—4 | m=-38+-086 r= '37 + 066 r= "40+ °073 | r= "484°122 | 
| Q="75+'153 Q='72 + ‘087 Q="79+°068 | Q="90+°072 | 


It is interesting to note the high values reached by the probable errors of 
tetrachoric 7, for the small quadrant frequency divisions. Both systems of 


probable* errors are based on the assumption that the total frequency is 4096. 


The following results were obtained : 


| Weighted 
Standard Deviation 


Weighted Mean 








Tetrachoric 7; ze “3809 ‘0176 
Association Q oe 5627 0840 


of applying tetrachoric r,, but Mr Yule has applied it; however, the weighted 


* In using probable errors at all here for a basis of comparison, we assume that the table may be 
considered to have arisen from actual observations. 
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mean of such method when applied is only 048 in error, the contingency method 
giving “3288 and the true value being 3333. We see that, judged by weighted 
standard deviations, the relative variabilities are as 176 to 840, or the stability of 
the tetrachoric 7, is 4°8 times as great as that of Mr Yule’s coefficient. 


Diagrams XIX anp XX. No Q differs by less than its probable error from the weighted mean, 
only one r; differs by more than its probable error. 
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We can look at this from the standpoint of the diagrammatic representation 
of the probable error (see Diagrams XIX and XX). In the lower figure we have 
the 16 points given by tetrachoric 7,; only a single one (shown by the individual 
linked with the non-black circle) is at a distance more than once the probable 
error from the weighted mean. In the upper figure not a single case occurs in 
which Q differs by so little as its probable error from the weighted mean value. 

34—2 
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There is indeed no comparison between the stability of tetrachoric 7, and @ in 
an illustration which Mr Yule himself has selected to indicate the badness of 
tetrachoric 7! Mr Yule’s coefficient of association, we are told by the statisticians 
of the Royal Statistical Society, will be used for Mendelian problems as soon as 
the Mendelians know a little algebra, but here in the very case of Mendelian 
data we find, as a mere coefficient of association, the 1, of the despised Gaussian 
theory is immensely superior to the Yulean coefficient*. Judged algebraically : 

Average value of (Q — mean @)/(probable error of Q) = 2°34, 

Average value of (r,— mean 1,)/(probable error of r,) = 57, 


or Q is four times as unstable as tetrachoric 7;,. 


(C) Severity of Small-Poa and Strength of Vaccination Immunity. 


Sixteen cases can be worked out for tetrachoric 7; and Q for this material, and 
there is a great variation in both 7; and Qf. 








| Haemorrhagic Confluent | Abundant Sparse and 
| and Confluent and Abundant and Sparse Very Sparse 
| a Mecebaiiet SE | 
At 10 | r= tt? 7,= °3602 + ‘0911 | r,=°2857+ 0614 | 7,=°2226 + 0626 
| @=1+ ‘0000 Q="7892+°1296 | G="5415+'1036 | Y=-4319+ °1036 | 


At 25 | = 2905 + 0598 | += 3694+ 0281 | 7,=+3484+°0249 | 7,=-2500+ -0293 | i 
| Q='57114°1079 | @=*5411+-0413 | Y= "4469+ 0303 | Q=-3444+ -0369 








‘2187 + 0556 | 7,=°4169 + 0309 | 


“: 7,= °3961 + ‘0275 | 7,=°2578 + 0332 
J=°4111+°0890 | Y="5714+ 0326 | ¢ 


= 5474+ °0351 | Y="4143+ :0569 














Between 45 and 
Unvaccinated 


| 7 
| tes 


In the form in which Table XXII is given on our p. 254, all values of Q and 7 
are negative. 





"= "6022 + ‘0287 | 7,=°5381 + -0349 
= ‘8599 + 0308 () = 8862 + 0524 


This table gives us the following results: 


Weighted 
Standard Deviation 





Weighted Mean 


Tetrachoric 7, es *3827 | 1211 
Association Q ‘a 5902 1737 


* It is needless to say we should never have thought of applying either of thuse coefficients to 
theoretical Mendelian data; we hold that the correct method was the method applied to this very 
case by one of us ab initio, namely that of product-moment r. 

+ There can be little doubt that the extreme variations are due to 9 out of the 16 divisions giving 
a quadrant of minimum frequency with less than 1°/, of frequency in it. 








en ares 
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In forming these weighted means we have left out the value of Q=1 + 0000 
and of r,=?+4?*. In the first place had we included this value of Q, its weight 
would have been infinite, and whatever the value of 7; and its probable error may 
be, it follows that Q must be more unstable than 7. We have got rid of this 
anomalous Q, although Mr Yule’s theory gives it infinite weight, and dealt only 
with the remaining 15 cases; but to do this is like considering the assets of a 
bankrupt, after we have disregarded the claims of his principal creditor. The 
above table, however, shows that Q is, notwithstanding this disregard, much worse 
than r;; in fact 43°4°/, must be added to the variability of 7, to reach that of Q. 

If we measure the deviations from the weighted means in terms of the probable 
errors as before we find for tetrachoric 7: the mean = 2°71 and for association 
Q: =331. Thus again, although since Q=1 is excluded, we find to a lesser 
extent tetrachoric 7; more stable than Q. 


(D) Lengths of Ivy Leaves. 


Mr Yule directly selected the lengths of growing ivy leaves as an especially 
skew distribution upon which to test tetrachoric 7, and this in a case where the 
table had been deduced for homotyposis, i.e. with all the local lumpiness which 
arises from that method of treatment. If we take the 4 x 4-fold of our Table XVII 
on p. 222, nine values of Q and tetrachoric 7; are available. They are: 





| 1—2 | 2-3 | 3—h 





——_____— |—_— a 


6998 +0046 | 7,=°6406+ 0039 | = D572 + 0058 
‘8531+ 0027 | @="8655+°0040 | Q="9167+-0085 
| 

= ae = eee ke 
= '5731 + 0033 = 5218 + 0046 
Q= 6768 + 0033 (="7570 + 0048 





2-3 | 1,="6406+ 0039 
| @='8655 +: 0040 





| 
| 
| 
m= "52184-0046 | y= Bi 
7570 + 0048 Gun 








We have .the following results : 





: Weighted 
| Weighted Mean | standard Deviation ; 
| PRs) Seog es Pe es eee ve es, 
e Tetrachoric 7; = | “5920 *0570 
| Association Q ae 8024 0757 





* On the difficulty, in fact idleness, of attempting to determine r, from tables with zero frequency in 
one quadrant: see Appendix I. That quadrant as we there show might have °5 in it, or, indeed, 
the material from which it is drawn unity. In the forms: case 20 terms in the 7,equation will not 
suffice to determine the value of r;. In the latter case r, might swing over from —1 to a small positive 
quantity. In fact the process of finding tetrachoric 7, in such cases warns its user that it is indeter- 
minable. Users of Q will assert that the relationship is perfect with a zero probable error ! 
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Therefore Q exhibits an increase of about 33°/,'on the variability of tetrachoric 7;. 
If we take the probable error test 


— weighted mean 7; = 11:53, 
probable error 





T" 
Average value of 


Average value of <= Seen oe 9 = 1619. 
probable error 


Thus Q represents an increase of 40°/, on the variability of 7;. 


(E) Correlation of Hair and Eye Colours. Livi’s Datat. 


The table is: 
Eye-Colour. 














| Blue Grey | Brown | Black Totals 
— 

| 
S| Blond... 9083 8187 7031 217 24518 
> | Red ook 343 518 819 | 37 1717 
9 | Brown... 17829 39467 117522 4945 179763 
a | Black... 3627 13433 54883 20919 92862 

| Totals 30882 | 61605 | 180255 26118 298860 | 


This table gives the following nine comparative results: 








| 
| Biue-Grey Grey-Brown Brown-Black 














| 

| | 

et oa ama ‘ a | 
Blond-Red ... | 7=°5307+ ‘0026 r= "5074 + 0023" | 7,=*4392+ 0054" | 

| @='7442 + 0023 Q='7129+'0023 | Q='8422+-0067 | 

| Red-Brown wee |) = °5239 + 0030 7,= 5255 + °0023* 1,= *4332 + ‘0048* | 
| | @Q='7356+°0023 | GY='7263+ -0023 Q= 8294+ 0067 | 
| mbes uae | ; | 
| | og | | 
Brown-Black ... 7,= °3601+°0025 | 7,=°3240+ "0020 74= 6449+ 0020 | 


” t 
| @='5791 + 0041 | (= °4393 + 0026 Y= "8365 + ‘0016 





Now it is of interest to compare the values of 7, found by fourfold tables with 
those obtained by other methods. Mr Yule has nothing but the method of 
pseudo-raaks to apply to the table in its detailed form. This gives a Yulean 
‘3680; the value found by corrected mean square contingency is ‘5189; the 
weighted mean of the nine tetrachoric 7 values is ‘4842, much nearer to the 
contingency value. Indeed the Yulean, if corrected for ranks becomes ‘3953, and if 
corrected for class-index correlations becomes ‘5051, i.e. differs quite insignificantly 


+ Antropometria militare, Part I. p. 62. 
* Mr Yule’s values do not agree to the second decimal place with ours in these cases. 
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from the contingency 5189*. It is thus easy to see why Mr Yule gets such 
an absurdly low value. If it be said that two out of the nine values of tetra- 
choric 7, are lower than the Yulean, the reply is a simple one: They are not 
divisions which would be made with our present knowledge of hair and eye 
pigmentation. There is no such thing as a “black” eye, at best it is only a dark 
shade of brown, and the division between brown and black is largely a matter of 
personal equation. Again the relative amount of pigmentation in a true grey eye 
may be as small as in many so-called blue eyes. The natural physiologicai division 
is between the blue-grey and the brown-black groups, although, if Livi put his 
hazels and greys with some pigment partly into blues and partly into browns, a 
division between blue and grey might give as good results. Passing now to the 
hair-colour, we believe the distinction between brown and black to be again a 
matter of personal equation, the shades of brown range up to black. Again red 
is a tint which may contain less pigment granules than the blond+, but in some 
varieties it has more granules than many browns. Hence the difficulty of such 
a division as that betwixt brown and red. ‘The real division would come between 
blond and brown with reds omitted or better still microscopically tested and 
placed in their appropriate division. Anyhow the divisions corresponding to the 
four left-hand top quadrants of the last table seem to us the most satisfactory, and 
they all agree in giving the correlation of hair and eye colours = circa ‘52, a result 
in excellent accord with the mean square contingency. 


Pearson’s data for British school children give by corrected contingency ‘52, 
and the following table shows that the values for all nations must be in excess of 
that determined by Mr Yule’s method : 


Correlation of Hair and Eye Colours by corrected 


Contingency. 


Italian Recruits aes ts ane see ae ‘519 
sritish School Children abe a oF See 524 
Baden Recruits Sat Say at ota rs 484 
German Jewish Children = a eis “at “444 
Prussian School Children... sae ibs ee 401 
Swedish Recruits — ove va ae a, 414 
Italians by Yule’s Method of Pseudo-Ranks ue “368 


* Assuming normal correlation, the correction for ranks is r=2 sin (30° x *3680)=*3953. For hair 
groups the class-index correlation is ‘8686 and for eyes ‘9010, and variate correlation 


='3953/(*8686 x *9010) =-5051. 


Of course these are only rough approximations, for Mr Yule’s theory of unit subranges precludes our 
using the corrections actually given. But any corrections will be roughly of this order. 

+ Some reds lack all granular pigmentation ; there are blonds which also do this, but they are very 
few in number. Both reds and blonds without melanin pigment granules are truly albinotic hairs, 
whether accompanied or not by albinotic eyes. 










Tetrachoric 7; 
Association Q 
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Weighted Mean 


4894 
*7258 


Returning to the relative values of tetrachoric 7, and association Q we find: 





| Weighted | 
| Standard Deviation | 


| 





} 

| 

"1102 
1161 








The variation is high for both coefficients, but even here tetrachoric 7, is better 
than Q, you must raise the variability of 7, 5-4°/, to reach the variability of Q. 
But the worth of the two coefficients is wholly unequal. If we are told the 
correlation of hair and eye colours is about *50, a whole series  f ideas is associated 
with this number; a very little additional information gives the relative standard 
deviations of the two variates—accurately enough for practice—and we picture to 
ourselves the regression lines and the associated changes in pigmentation of hair 
and eye classes. But what does an association coefficient of ‘73 for special dicho- 
tomies tell us? We venture to assert that it conveys no information whatever 
to the investigator’s mind, and is absolutely incomparable with other association 
coefficients of the same table, because each for the same system depends on the 
values of the variates at which the divisions are made, and because it has not the 
least relation with any ,‘:ysical properties of the distribution. If Mr Yule replies 
to these criticisms, that tetrachoric 7, is also unstable, if not to the same extent, 
and that a function of Q does provide (if the table be doctored) a certain difference 
of percentages, we answer that tetrachoric 7, is far from so unstable for the distri- 
butions of ordinary practice as he has endeavoured to make it out by selecting: 
(i) pigmentation data, which as long ago as 1901 were recognised as irregular, and 
(ii) markedly skew frequencies*. The instability is many times compensated by 
the definite physical significance of the coefficient. 


Further the percentages which Mr Yule’s deduced coefficient of colligation 
represents are wholly artificial and incapable of any rational interpretation. If we 
were to equalise the number of vaccinated and unvaccinated in any locality, how 
could we equalise the number of deaths and the number of recoveries, and what 
intelligible meaning can be given to the percentages when it has been done ? 
How can we possibly give any interpretation to the result reached by a disciple 
of Mr Yule that the “index de corrélation”—i.e. Mr Yule’s Q—between the 
stature of recruits and rent in the 20 districts of the city of Paris is “ perfect” ? 
Average stature of recruits for different districts forms a continuous variate system, 
so does average rent, and the two properly correlated would show the nature of 
the regression line, but this disciple of Mr Yule’s, in order to save a little absolutely 


* Mr Yule (loc. cit. p, 624) speaks of the selection he has made as ‘‘ exhibiting moderately skew 
Cistributions.” Unless he means that they are not U- or J-shaped curves, we consider this an entire 
misnomer. 
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needful arithmetic, tells us that the index of correlation between stature and rent 
is perfect * ! 


(F) Eye-Colour Data of Pearson. 


We now come to the eye-colour data. These are the only cases we have so 
far worked out in which the variability of Q is at all comparable with that of 
tetrachoric 7;, and the reason is not very far to seek. We have already pointed 
out that the true difficulty in these eye-colour tables appears to lie in lumps 
of excess frequency in or near the corner cells of the quadrants of less frequency 
and that thus the material is heterogeneous in character. We think it quite 
probable that this is due to the inclusion of really senile fathers on the one hand 
and of infant sons on the other, or of pairs of brothers or sisters one of whom is 
an infant. In this manner eye tints which are originally, or will become, mediocre 
in colour may be classified as very light. The record which Sir Francis Galton 
provided gave no ages; the original data are now in the possession of the Galton 
Laboratory and it is proposed to reconsider both these and the Huxley Lecture 
data for eye-colour+, paying attention to change of eye-colour with age. This 
investigation will necessarily take a considerable time and it might be wiser 
to await its conclusion before entering further on this topic. But we should no 
doubt be told that we were cmitting just those cases that appeared favourable 
to Mr Yule’s association coefficient, and accordingly we have included the eye- 
colour data here. The table on p, 274 gives the Brother-Brother coefficients for 
16 divisions. 


Now this is a symmetrical table and Mr Yule reckons the diagonal coefficients 
once, and the repeated coefficients twice, but we are doubtful of the accuracy of 
this process. Such a symmetrical table leads to exactly the same results from 
symmetrically placed divisions, and it is not clear why double weight should be 
given to such a division as 2—3 and 6—7 because as the table is written out 
it occurs twice. It seems to us that the diagonal values and all on one side of 
them are the only independent coefficients ; we have lost the independence of the 
coefficients in the cells symmetrically situated with regard to the diagonal by 
the very process of adding the tables for First Brother in terms of Second Brother 


* A. Niceforo, ‘Contribution 4 l’étude des corrélations entre le bien-étre économique et quelques 
faites de la vie démographique,” Journal de la Société de Statistique de Paris, 52 Année (1911), 
pp. 322—341. Professor Niceforo studies ‘‘corrélations” by aid of the coefficient of association, 
which he applies to the following continuous variates: stature, rent, probable income, numbers of 
illiterates, of insanitary dwellings, of paupers, of pauper funerals, of workers, size of families, numbers 
of cubic feet of air space, of inhabitants to the acre, special death-rates from all sorts of diseases, 
general death-rates and birth-rates etc., ete. We believe that the whole of this work must be redone. 
Even if a rough estimate had been required, Mr Yule’s coefficient should not have been used, but the 
division made at the median values and Sheppard’s formula for tetrachoric r, adopted. There is not 
even the excuse of apparent discreteness in any of Professor Niceforo’s attributes. 

+ Pairs of siblings in the same school are much more nearly of an age than any pair between the 
ages of 5 and 15 taken from the population of school children. While the head-measurements were 
corrected for age, the hair and eye colours were not, and probably the changes are more important than 
we believed at that date. 
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First Brother. 
| 2-3 3-4 4—5 6—7 
| 2-8 | m=b058+°0172 | m='3761+°0184 | 7,=°2428+-0212 | 7,=-2373+-0240 
me Q= 6214+ 0176 | Q=-4920+-0230 | Q=-3434+-0304 | Q=-3653+-0383 
} 
Z -..| ———- -— | 
E | sh | m=3761+°0184 | =°51854°0160 | ry=3524+°0196 | 7=-2777+ 0230 
ma = 4920+ 0230 | Q=-62294+-0165 | Q=-4603+ 0232 = 3963 + -0301 
3 
2 | 
B | 4-5 | m= 24284-0212 | r= 35244-0196 | m= "36774-0209 | 7=-2978+ -0246 
Q= 3434+ 0304 | Q=-4603+-0232 | G=-5089+-0231 | @=-4306+-0301 
| 6—7 | m=-237340240 | r= 27774-0230 | r,=-29784-0246 | r,=-3128+-0276 
| Q='3653+-0383 | @=-3963+-0301 | Q=-4306+-0301 | ‘4705 + 0326 








and Second Brother in terms of First, and the repetition of the same numbers in 
a second cell does not give those numbers double weight. Had we worked our 
tables for Elder Brother and Younger Brother, each cell would have had 
independent weight, but adding them reversed we have lost one-half of the 
non-diagonal independent frequencies, and we must not still retain the same 
number of independent weights. Relative to Mr Yule’s method of procedure, this, 
in our opinion, true method of weighting emphasises in symmetrical tables the 
diagonal columns and would correspondingly better tetrachoric 7; as against Q. 
But we have not used what we consider the true weighting, because it might be 
said that it had been adopted with a view to bettering our position. 
before we have the following results for the 16 coefficients : 


Arranging as 





Weighted 
| Standard Deviation 


Weighted Mean | 
eee ; =Fa 
| 


"3474 | 0931 


Association Q 5083 


Tetrachoric 7, 
“0952 
| 


| 

Thus the positions of r, and Q are just reversed by proper weighting*. The 
coefficient of variation has no meaning, we hold, in the case of mere numerics like 
r, and Q, both of which may range from —1 to +1 in the general case. Indeed 
the case is more complex than can be accurately determined even by weighted 
standard deviations. For Q, although nominally ranging between +1, is for any 
given case numerically greater than the corresponding tetrachoric 7,, and thus 
their variabilities if not functionally related are related by limitations. For 
reasons already given we see no advantage in considering the probable error of 


* Mr Yule (loc. cit. p. 634) gives them as ‘084 and ‘081. If we take into account all the reasonably 
possible divisions 36 in number, i.e. 1-2, 2-3, 3-4, 4-5, 6-7, 7-8, the standard deviations are respectively 
o,= ‘121 and ¢g='130, while the ranges, on which Mr Yule appears to lay stress, take for 7, the value: 
“59 and for Q: the value -92! 


POE A 
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colligation (it is quite easy to deduce another “ coefficient of association” from 
Q, which will have a still less probable error than , or one from tetrachoric 7; 
having a less value than both: see p. 262 ftn.). Further we do not consider that the 
product-moment method (i.e. ¢ which Mr Yule erroneously terms the correlation 
coefficient) has any application to these eye-colour data; it is purely idle to deal 
with the difference between a ‘light brown’ and ‘dark brown’ eye as a discrete 
* unit.’ 

We next turn to the Father and Son Eye-Colour data. The table for 
16 divisions is: 

For Father. 


| 
2—3 ss 4—5 











2-3 | ry='5044°029 | m= :405+-031 r= "385+°035 | ry="316+°041 | 
6164-030 | Q="528+°038 | Q="5494-050 | Q=-4874-064 | 


ee 
pe r 
| 





| oe aa Na Tie 


S| s = "3914-031 | 7,='550+-027 | 7,=-4934°032 | 1,=-421+4-038 
RQ | | Q@=:500+°0388 | Q='658+-027 Q= "632 + 034 Q= "579+ 045 
fe | | | 





| 

| ta Soe eae 
| f-5 | m=2764°035 | r= 4664-031 | 4 ="5754-032 | y= °5174 037 
| Q@='381+-049 | Q="590+-034 | QY=-7164-028 | Q=-684+-035 
| | | 





= "2664-040 | r,='374+°038 | %4="4574-040 | r,=-512+-039 
= "4024062 | = "519-047 | Q= 6224-010 | Q= 695+ 037 





In this case we find: 
‘ Weighted | 
| Weighted Mean | standard Deviation 








‘ 
| Tetrachoric 7, 


Association Q 


443 | 086 
“086 





If we thought range was a measure of variation we should have: 
Range of tetrachoric 7; = ‘31. 
Range of association Q = ‘34. 
It is clear that for these eye-colour tables Q is almost as stable as tetra- 
choric r,. Indeed judged by the probable error test Q is slightly better than 
tetrachoric 7, for we have 


| Mean value of Brothers and Brothers | Fathers and Sons 
| | | 
(r:—7,)/(p.e. of ry)... | 3°5 22 
(Q- Q)/(p.e. of Q) ... 32 i*9 
| 





35—2 
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The weighted standard deviation test, which is probably a better one, shows 
that r, is superior to Q for the Brothers table and equal to it for the Father and 
Son table; both are terribly bad and a slight change in the nature of the test 
may make one or other apparently slightly superior. In this eye-colour material 
in its present state we are quite clear that the tetrachoric method applied 
to extremely skew divisions will not give consistent results. But this had been 
stated years ago, in passages which Mr Yule refrains from quoting. We are 
equally clear that Q gives no better results, while for the bulk of tables of 
statistical practice, it certainly gives less stable results than ry. 


(G) Age of Husband and Wife Data. 


This is a case of extreme skewness* selected by Mr Yule to show how tetra- 
choric 7, varies. It is also a case of heterogeneity, for second marriages are mixed 
up with first marriages. With his usual ingenuity Mr Yule has exhibited a curve 
and a table, in which he has increased the number of the terminal values, where 
tetrachoric r; gives toc low values and has little weight, and shown few of the 
central values where 7, is relatively steady and has great weight. Further, he 
tells us that: 


“From the standpoint of the calculator, however, the table presents the 
disadvantage that the correlation is high, viz. ‘91, and the approximation to the 
value of the normal coefficient correspondingly slow, eight to ten or twelve terms 
of the equation being necessary to give a value fairly trustworthy in the second 
place of decimals” (loc. cit. p. 624). Mr Yule may have got results “ fairly trust- 
worthy in the second place of decimals,” but as either the odd or even series 
of powers of 7; in the equation rarely becomes convergent till much beyond the 
twelfth term—we have had in some cases to go to 18 or 20 terms—the confidence 
Mr Yule put in his values was quite unjustified. The case is an interesting one 
because the true correlation is very high, i.e. ‘9253+ 00004+. Accordingly, 
Mr Yule’s association coefficient whatever division is taken is constrained to lie 
between something like ‘952 and 1. It is therefore difficult to compare the range 
of instability of Q with that of tetrachoric r,, We are bound to consider both 
in relation to their possible ranges. 


That the curve given for tetrachoric 7, indicates no defect in 7 relatively 
to other coefficients, will be at once appreciated by comparing it with the curve 
for Mr Yule’s coefficient of colligation #! That coefficient varies just as much, only 
it is the other way round: see Diagram XXI, p. 278. As for the Boas-Yulean 
coefficient ¢, we could only assume from it, that there was, when the dichotomies 
were at young or old ages, a low relation between the ages of Husband and Wife. 

* The actual skewnesses are: for Husbands Sk.=‘71 and for Wives Sk.=-76. These are among 
the highest values on record for skewness, and the surface is not ‘‘ moderately skew” as Mr Yule 
without publishing any numbers (loc. cit. p. 624) states it to be. 

+ The regression line is sensibly curved at the terminals, but this does not markedly influence y 


which uncorrected is ‘9142, as against r=°9136 without Sheppard. With Sheppard r rises to -9253. 
Mr Yule gives -91 for this correlation, which appears to us an uncorrected value. 
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Yet according to Mr Yule @¢ is “applicable in its entirety to the 2 x 2-fold 
table.” Of course we may “apply” any method to any problem, but whether 
we shall obtain anything of value from the application is another question, 
and Pearson’s original statement about Boas’ coefficient that “it differs in the 
simplest cases from the true coefficient of correlation, and often differs con- 
siderably...and its use is liable to be misleading, especially if compared with 
values of the true coefficient found by other processes*” was amply justified and is 
well illustrated by this case. The tetrachoric 7, values cluster round the true 
value, the Boas-Yulean never reaches it. In the following table the values of 
the tetrachoric 7,, of the coefficient of association Q, of the coefficient of colligation 
w, and of the Boas-Yulean, Pearson’s ¢, are recorded with their probable errors for 
all divisions at like ages, since these are the divisions selected by Mr Yule. 


Ages of Husband and Wife. Values of the various Coefficients 
proposed to measure Association. 











| 
| | 
Division | 
of age " Q x : $ 
SE EERE EROS Set — | 
18 2 | °9991+4°00018 | ‘9587+ 0004 | 0711+ °0097 | 
| ea 2 “9965 + ‘00020 9197+ ‘0022 | -1240+:0042 | 
| 20 *7755 + ‘0025 ‘9932 + 00016 8895+ °0012 | -2128+ 0024 | 
fea "8813 + 0030 9734+ °00010 | -7919+°0003 | °5644+°0005 | 
| $0 "9302 + ‘0001 9745+ 00006 | -7958+-0002 ‘7137 + 0002 | 
| 85 | :9522+-0001 9795 + “00005 | *8151+ 0002 | “7735 + -0002 | 
40 ‘9535 + 0001 ‘9821+ 00004 | °8265+-0002 | *7948+-0002 | 
| 45 | 9630+ ‘0001 9846+ ‘00004 | °8381+°0002 | -8003+ 0002 | 

50 =| +*9601 + 0001 ‘9850+ 00004 | 84014-0002 | -7867+-0002 

| 56 9570+ 0001 | °9863+°00004 | *8468+-0002 | °7658+ 0002 

| 60 9471+ °0001 | -9864+-00005 | -8471+ 0002 “7260 + 0003 
65 | *9350+°0002 | -9881+ 00005 "8565 + 0003 ‘6733+ °0005 | 

70 ~~ | *9159+-0003 | -9897+ 00006 "8656 + ‘0003 “5947 + ‘0007 

75 | 891740006 | *9922+-00006 | 8823+ "0005 "4932 + -0012 

80 | *84504+-0020 | -9946+ -00008 “9013 + ‘0007 "3512 + ‘0024 

85 | -8081+ “0024 9975+ 00010 | °9317+°0013 "2046 + -0046 

| | 


| 


The first two divisions were not used because the work of calculating tetra- 
choric tables to the large number of terms in 7; required for getting results even 
approximately correct is excessive when r, is large and the dichotomies extreme. 
It will be seen that we very frequently differ by a unit in the second figure of 
r, from Mr Yule’s results and that our diagram (Diagram XX1) differs from his 
for r, by being sensibly nearer to the true correlation. He has stopped his series 
before it began to converge properly. 





The following gives the percentage in the least quadrant at each age division : 

















1°92 | 2°13 | 2°12 


| 
0003} 005] “031 | 962 





| 
1-28 | 1-69 | 1°35 











* Science, Vol. xxx. p. 24. 
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Diagram XXI. Diagram showing variations of various coefficients for fourfold tables with diagonal 
dichotomies in the case of Ages at Marriage of Husband and Wife. 
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The reader may ask what are our actual grounds for neglecting the results 
obtained from extreme divisions in the case of tetrachoric r,? The answer is that 
it is not only experience of how slight skewness at extreme divisions produces large 
changes in the value of r,, but our knowledge that the weights of these outlying 
divisions are in the case of r, (to a lesser extent in the case of $) insignificant 
as compared with the weights of the central divisions, No such grave differences 
exist in the case of Q or (to a slightly lesser extent) in the case of w. The 


following table gives the weights, treating the weights of the division at 20 
as unity. 











l | 

Division | rt Q | w ? 

| 

20 1 1 1 1 | 

25 69 2°7 12°5 24°5 
30 370 6°1 27°4 97°9 
35 772 10°9 36°2 171°9 
40 977 14:3 40°1 187°7 
45 1276 16°7 44:7 153°1 
50 977 15°8 40°1 145°6 
55 625 15-0 31°4 97°1 
60 319 11°9 25°2 | 55°4 
65 142 10°0 18°5 22°6 
v0 54 7°7 12°5 | 10°6 
15 16 5:9 6:9 3°8 
80 1°5 3°7 3-0 11 
85 11 2°5 0-9 0:3 


| 








It will be clear: first that it is idle to measure the variations of these quantities 
from anything but a weighted mean, and secondly that no one would after seeing 
such weights dream of determining tetrachoric r, from extreme divisions. If we 
omit the first two and the last two values of r,, noting their slight weight, then 
the remaining values in no case differ by ‘04 from the true correlation and the 
mean divergence is ‘015 only. Mr Yule’s @ must be steadier here than 7,, because 
its range is limited by the nature of the case to about ‘05, while there is no limit 
to the range of the latter. 


(H) Association in a typical table of Ordinary Statistical Practice. 


Thus far we have dealt with the relative stability of the coefficient of 
association and tetrachoric r, on material especially selected by Mr Yule to 
exhibit the variable character of tetrachoric r,, But in concluding this branch 
of our discussion we should like to exhibit the relationship of the two coefficients 
for such surfaces as occur in ordinary statistical practice. For this purpose we 
will make no selection ourselves, but illustrate the matter on the correlation table 
chosen by Mr Yule himself in his first memoir on Association*. We take the 
table as it is given without any knowledge of its degree of approach to the 
Gaussian or of the material with which it deals. Its coefficient of correlation by 
the product-moment method = ‘677. Now we commenced by leaving out all the 


* See Phil. Trans. Vol. 194 A, p. 277. 
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border cases of the table which give Q=+1 with infinite weight. Also the 
cases adjacent to these where Q, depending upon very small frequency in one 
quadrant, has a very high value, a low probable error, and accordingly a very 
great weight. With the exception of this we took every alternate case in every 
row and so obtained 41 coefficients; the corresponding tetrachoric 7;s were like- 
wise calculated. The results are exhibited in the accompanying table, the 
notation being that adopted by Mr Yule to mark the divisions. 


There is practical certainty that any extension of the boundaries of this system 
of divisions could only better the position of 7, relative to Q. We obtained the 
following results : 


Weighted Mean Q=°834. Weighted Mean r,= 672. 
Weighted s.p. of Q = 0513. Weighted s.D. of r, = 0478. 


We must therefore increase the variability of tetrachoric r, by 28 °/, to reach 
the variability of Q. This shows the real degree of difference in the stabilities 
of 7, and Q for the correlation-tables of ordinary practice. We note also that the 
weighted mean of 7, differs by only ‘005 from the true product-moment value, 
°677, of the correlation. 


Now let us approach the matter from the standpoint of probable error. We 
have : 
Mean (7, — 7,)/probable error = 1:06. 
Mean (Q — Q)/probable error = 1°72. 


In the case of 7,—7, we have 21 below their probable error in value and 20 
above it, just what there should be. Only 5 exceed twice the probable error and 
there should be 4. In other words the distribution of 7, in terms of its probable 
error might well have arisen from random sampling of Gaussian material. Now 
turn to Q: in terms of the probable error @ shows an increase of upwards of 
62 °/, on the value for r,. There are only 15 values of Q—Q below this probable 
error, 26 in excess of it. There are 15 values instead of 4 in excess of twice 
the probable error. There are five values in excess of three times their probable 
error, compared with only two occurring in the case of r,. It is, we think, obvious 
that the variations in the case of Q are far greater than those due to random 
sampling*. Our Diagrams XXII and XXIII (pp. 282-3) indicate two points. 
In the upper parts of these diagrams we have plotted r, and Q to the percentage of 
frequency in the quadrant of least frequency. We see at once that if we avoitl 
quadrantal frequencies under 1 °/, the value of tetrachoric 7, is for practical purposes 
equal to the true product-moment r. The reader will recognise the far greater 
scatter of Q. In the lower figures we have plotted 7, and Q with relation to their 
probable errors; the full dot denotes the observation, the open dot the end of a line 


* The reader must remember that there is no reason to assert that the errors must be of the order of 
random sampling; there is only one table, not many random samplings from a much larger mass, and 
we take different divisions. But we assert that if the errors be of that order, then the method is as good 
as the data warrant our using. 


Biometrika 1x 36 








Values of Q. 


282 On Theories of Association 


equal to twice the probable error; where this open dot is seen, there twice the 
probable error fails to reach the true value of r, or the mean value of Q as the 
case may be. The failures of Q are seen to be three times as many as those of r,. 


Dracram XXII, Values of association Q for a typical table of ordinary statistical practice. 
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Values for percentages under unity, 


We may conclude justifiably that even asa coefficient of association r, is much 
more stable than Q for the usual type of distribution; and we have already seen 


that this is so even for many marked cases of skewness. 
mation and often a very good approximation to the true correlation. 


It is always an approxi- 
On the 


other hand Q conveys no idea to the mind at all, except in as far as it is an 


approximation, usually very bad, to the same correlation. 
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Calculated values of 1; (fourfold tables). 
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General Protest against the Use of Mr Yule’s Coefficient of Association. 


It may be said that a vigorous protest against Mr Yule’s coefficients is 


unnecessary. 


We believe on the contrary that, if not made now and made 


strongly, there will be a great set-back to both modern statistical theory and 


Diacram XXIII. Values of r, for a typical table of ordinary statistical practice. 
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with twice their probable errors. 


on enlarged hori 


modern statistical practice. The publication of Mr Yule’s text-book has resusci- 
tated the use of his coefficient of association; it is now being used in all sorts of 


quarters for all sorts of unsuitable data. The publication of Mr Yule’s recent 
paper on Association will also lead to the use of his method of pseudo-ranks. The 


36—2 





errors). 
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coefficients of association and colligation are in our opinion wholly fallacious, they 
represent no true properties of the actual distributions, and they have no adequate 
physical interpretation. The coefficient obtained by the method of pseudo-ranks 
is equally fallacious, unless the variables proceed by and have been grouped by 
discrete units. But both Mr Yule’s methods are so easy of application, that those 
who will not devote the small amount of time and energy requisite to the dis- 
cussion of data by more adequate processes at once adopt them without further 
consideration. Thus Mr Yule’s coefficient of association is passing into French 
statistical literature as Vindice de corrélation, a term originally introduced by 
Galton for the coefficient of correlation and now transferred to the different and we 
hold fallacious Yulean measure of association. Professor Niceforo writes *: “ Mais 
pour nos études sur la corrélation entre les différents phénoménes économiques, 
démographiques et autres, dans les quartiers et les arrondissements des grandes 
villes, o nos trouvions en présence de séries formées par un nombre plutdét re- 
streint d’éléments (80, 25, 20) nous avons préféré nous servir de la méthode Yule, 
plus rapide et donnant, quoique moins précise que la méthode précédente, des 
résultats trés satisfaisants.” The precediug method is the method of the product- 
moment which Professor Niceforo discards for Mr Yule’s association coefficient, 
using it solely—and apparently with Mr Yule’s approval (see loc. cit. p. 324)— 
for absolutely continuous variates, where the coefficient of correlation could be at 
once found and a graph easily drawn of the regression line. Professor Niceforo 
speaks everywhere in his paper of the correlation being this or that, and entitles 
his paper “Contribution & l’étude des corrélations entre le bien-étre économique 
et quelques faits de la vie démographique.” What his or Mr Yule’s test of 
“résultats trés satisfaisants” may be we do not know, but we consider that the 
whole of Professor Niceforo’s work will have to be repeated before anything 
can be learnt from his data. 

We have worked out two illustrative cases from Professor Niceforo’s material 
to indicate what we consider the extreme danger of Mr Yule’s methods. In the 
first case, that of the average stature of conscripts and the average rent in the 20 
arrondissements of Paris, the answer Professor Niceforo gives is @Q=1+°0. The 
correlation we are told is perfect. In the second case, that of the correlation 
between the mortality and natality of the same arrondissements, we are told that 
the “corrélation...est trés forte.” Professor Niceforo gives it as: 

Indice de corrélation R = 0°977 + 032,450 (sic !). 

We had failed to give any real interpretation to either of these results, and we 
turned to the original data. These we found hard to discover, because Professor 
Niceforo does not refer at each stage to the exact source of his original material. 
Manouvier gives the mean stature of the 20 arrondissements in 1880 and 1881+. 
These appear to be what is given as Statura media in Professor Niceforo’s book 


* Journal de la Société de Statistique de Paris, 52 Année, pp. 322—341 ; see p. 324 and elsewhere. 
+ Bulletins de la Société @ Anthropologie de Paris, Année 1888, p. 161. 
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Forza e Ricchezza, Turin, 1906, p. 15. For the rents we have not been able to 
verify Professor Niceforo’s returns, or to discover whether they are for approxi- 
mately the same years as the statures. In the Annuaire Statistique de la Ville de 
Paris, 1901, no similar details as to rent appear to be given, and Professor Niceforo 
gives no reference to the year for his rent data. Taking, however, the data given 
in his book we find : 

Mean Stature = 1645-75. Mean Rent* = 210°8. 

Standard Deviation = 6212. Standard Deviation = 161°43. 

Correlation of Stature and Rent = °7825 + 0591. 
Regression line of Stature S on Rent R: 
S = 1639°4 + ‘03011 R. 
This is represented on Diagram XXIV and we see at once that we have a quite 

intelligible relation between average rent in the arrondissement and average 
stature of conscripts. The correlation is high, but far from “ perfect,” and is subject 


Diagram XXIV. Association of Stature and Rent in Parisian Arrondissements. 
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Mean Rent in Parisian Arrondissement. 
to a large probable error, so that the true relation might be easily anything between 
66 and ‘90. If, however, we divide at the mean values and add up all the dots as 
1| 8 
11] 0 
giving association Q=1+°0. What is the value of this, what does it signify ? 
We fail to extract the least idea of the real relationship—as represented by the 
dots on the graph—from such a statement. Verbally it means simply that no 
rent of over 210°8 is associated with a stature under 1645°75, but to assert that 


Professor Niceforo appears to have done, we have the fourfold table: 


* We are unable to say in what units or for what periods rent is measured. 


+4) 








286 On Theories of Association 
this signifies that “la corrélation est parfaite” is totally misleading. Professor 
Niceforo speaks of the correlation between stature of conscripts and the rent 
of their arrondissements of origin; he does not say that he is merely dealing with 
the practically unimportant fact that no arrondissement with a mean rent over 
211 has given a mean stature under 1646. But even to state this simple fact 
would be more enlightening than to talk of the index of correlation equalling 
unity. 

The second case we took was that of the mortality and natality of the same 
20 arrondissements. Again we had great difficulty in tracing the original source 
of the information. 

However in the Annuaire Statistique de la Ville de Paris, Année 1904, we find, 
p. 125, the natality per year based upon 1000 women of ages 15—49, and, on 
p. 135, the annual mortality based on 1000 inhabitants for these arrondissements. 
The natality is for the period 1886—1895, the mortality is given for the periods 
1886—i890 and 1891—1895 separately but not combined. As Professor Niceforo 
does not give a reference to the years dealt with, nor the source of this Parisian 
data, we have taken the simple mean of the mortality for the two periods and 
correlated this with the natality for 1886—1895. The constants found are: 

Mean Mortality = 21°73. Mean Natality = 79:065. 
Standard Deviation = 49215. Standard Deviation = 23°7298. 
Correlation = r = 9163 + ‘0242. 
Regression Line of Natality V on Mortality M: 
N = 44181 M — 16°94. 


Diagram XXV indicates the position of the observations and their relation to 
the regression line. It conveys as adequate a representation of the whole relation- 
ship as it is possible to give on the data. But if we count up on the graph the 
dots in the quadrants obtained by drawing the lines at the means we obtain the 


River 8 : ? 
fourfold division 5 | 1 leading again to Q=1+°0*. What information as to 


the real nature of the correlation is given by such a result ? 

If Professor Niceforo had desired to obtain a rapid approximate value to the 
true correlation in these cases, he should have drawn his divisions at the medians, 
not the means, and used Sheppard’s formula 


Y = COS TT nm for —~ |b 
a+b bl\a 
to find tetrachoric r,. In this case his results would have been the tables 
8 |2 2 
218 and T}9 


r, = °81 + °05, for rent and stature, 
r, = "95 + 04, for natality and mortality. 


* Professor Niceforo gives ‘977 probably from slightly different data. 
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Both results correspond within the limits of the probable errors with the actual 
correlations ‘78 and ‘92. These results are far more valid than the association 
results, but of course have not the value of a graph showing the regression line. 
Now here is a case of a man proposing to deal with a most interesting problem, 
for which quite serviceable data exist, led at once from the track of sound treat- 
ment by the application of this fallacious doctrine of association! And this is far 
from a solitary instance of the harm Mr Yule has done by the publication without 
adequate warning or guidance to his readers of the section of his text-book 
treating of association. 


Diacram XXY. Association of Natality and Mortality in Parisian Arrondissements. 
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(15) Further Criticisms of Mr Yule’s Methods of Controversy. 


To certain minor points of Mr Yule’s memoir reply will be made in the present 
section. 


\ 

(a) Partial Correlations formed from the Normal Coefficient (loc. cit. p. 627). 
Mr Yule is deliberately confusing two different ideas, the correlation of A and B, 
two continuous variables, for a constant value of a third variable C, with the 
correlation of A and B for a given range of values summed under a certain class- 
index or group of class-indices of the variable C. The former is the only sense in 
which the term partial correlation has been hitherto used, and there is no reason 
why Mr Yule should deliberately confuse this sense with a wholly different con- 
ception, that of the correlation of A and B for a sub-universe of C. The whole 
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theory of the correlation of sub-universes has been dealt with by Pearson* and the 
formulae obtained in 1901 were known shortly afterwards to be perfectly general, 
although the proof of this generality was only recently published ft. 


Let s, be the standard deviation of the sub-universe selected, o, its standard 
deviation before selection. Let the three characters be represented by the sub- 
scripts 1, 2, 3; then if we write 

Pr = Te Vv1— 8;/0,7, Pis = "13 v1— 8/0,°, 


we have for the correlation within the sub-universe 


“ae 123 — Pi2Pi3 t 
a RR on eee (age ae 
Vvi— Piz v1— Pis” 
To3 — Tro? 13 
and Tn = a 


Vl —rnegVv1—172 
for the partial correlation coefficient. 
In the case of normal correlation to which Mr Yule is referring, s, is the 


standard deviation of the truncated portion of a normal curve. For his special 
cases when that portion is one-half the frequency curve 


si =o — 
3 2 
and %= -o,. Hence 
T 
2 2 
Pi = 3 Pi =A M3) 
2 
Io, — — Mo? 
T 





But it is equally feasible to get almost in a line the value of ,¥., for any 
truncated portion of the normal curve other than one-half, and tables for 
determining the values of ,. and ,u,, the moments of the tail about the severance 
ordinate at #, giving 

8° = O71 (afte — apr’), 


were calculated by Dr Alice Lee and published in 19088. These functions were 
termed the incomplete normal moment functions. Had Mr Yule paid attention to 
any of this work, he would hardly have published his special illustration and 
remarked “ At present I have not been able to carry the matter further” (loc. cit. 
p- 628). The general formula had been given eleven years ago, and tables from 
which it was quite easy to calculate special cases were published four years ago! 

* Phil, Trans. Vol. 200 A, pp. 1—66. 

+ Biometrika, Vol. vit. p. 437. 

{ Phil. Trans. loc, cit. Eqn. (lvi), p. 25. 

§ Biometrika, Vol. v1. p. 66. 
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Pearson did not give a name to his coefticient ,r,,; above, but he carefully 
distinguished it from the partial coefficient ,r,, and stated that the former 
generalised form which did not,select at a given value but rownd it was more 
important for both natural and artificial selection (loc. cit. p. 31). Mr Yule has 
apparently just awoke to the importance of ,¥.,, but that is no reason why he 
should confuse it in the minds of his readers with ,7.,, or lead his readers to believe 
that we do not know the difference between the two. To avoid confusion of this 
kind in future we shall henceforth speak of ,7, as a singular partial correlation and 
if; as a plural partial correlation. For, the former expresses the relation between 
A and B for a single value of C and the latter for a plurality or universe of values 
of C. In actual practice there is little difficulty in determining ,r., if there is 
enough material, for all we have to do is to take the given universe out of C and 
correlate the resulting A’s and B’s. On the other hand, when we speak of the 
relation of health of child to health of mother for constant employment of mother 
or constant habits of mother, we do not look upon the universe of employed 
mothers as a whole or the universe of mothers with bad habits as 1 whole. We 
are thinking of employment of mother as a graduated character and narental habit 
also as a graduated character, and we properly use ,7,, to measure the relationship 
of health in mother and child for a constant grade of employment or constant 
grade of bad habit in the mother. In this case the use of ,r., has precisely the 
same justification, if 7,3, 7; and 7. are found by tetrachoric tables, as if they had 
been found by product-moments, provided the assumption of a Gaussian distribution 
be reasonably justified for the material in question. There is no other source of 
error in the use of ,r,, as Mr Yule obscurely seems to indicate. It did not need 
Mr Yule’s numerical illustration (loc. cit. p. 629) to prove that ,Y., for the two 
sections of an unequally divided normal curve— defectives’ and ‘ undefectives’ 
(sic !)—is in neither case equal to ,r. The two coefficients have different values 
and different significance whether the frequency be Gaussian or non-Gaussian. 


(b) Mr Yule’s Failure to distinguish between Criticism of Method and 


Criticism of Conclusion. 


We have seen in the course of this paper that Mr Yule’s coefficient of 
association automatically rises in all cases examined when our dichotomy is very, 
one-sided. This is very obvious even in skew distributions; compare Diagrams 
XVIII, XIX, XXI and XXII where the rapid increase of Q for small per- 
centages is obvious. Heron working on the Gaussian surface had demonstrated 
that this was an absolute necessity which flowe:! from the theory and that there- 
fore it must follow, even for surfaces only approximately Gaussian, that two or 
more values of Q were quite incomparable if the dichotomic lines were at 
different percentages of marginal frequencies. He argued that no valid proof 
could therefore be based on the relative sizes of Q in a series of tables for which 


Biometrika 1x 37 
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we knew nothing about the frequency and for which the dichotomic lines gave 
marginal frequencies having any continuously ascending order. 


Such invalid discussions as to the “apparent law that associations were on 
the whole higher where populations were healthier or more defective” and on 
the relation of change of association between two defects with age had actually 
been attempted by Mr Yule*. Heron criticised, it will be seen, not the laws, if 
they be laws, but Mr Yule’s attempt to investigate them by “association.” He 
wrote}: “For precisely similar reasons his discussion of the change of association 
with age must be dismissed as entirely fallacious. There may be, and probably is, 
some decrease of association with advancing age, but the enunciation of such a 
law on the basis of a number of coefficients of association is purely idle.” Heron 
then proceeded to show that the proportions of blind and of mentally deranged 
both increased steadily from infancy to age, and this signifies that if the frequency 
surfaces were of the same type and really had the same correlation at each age 
the Q would steadily decrease with age. It will be seen that Heron’s criticism 
applied solely to Mr Yule’s methods; it was an unanswered and, we believe, 
absolutely unanswerable criticism of the absurdity of trying to deduce laws from Q. 
How does Mr Yule meet it? He writest: “Dr Heron also objects to my con- 
clusion that association decreases with age. His objection appears to be that the 
product sum correlation does not decrease so markedly or regularly with age in 
one of my cases that he examined...and that no evidence has been given that 
the normal coefficient decreases.” 


The destructive criticism that Q for the Gaussian and for all surfaces of which 
we have any practical experience increases the more one-sided is the dichotomy— 
and Mr Yule thinks nothing of ‘02°/, dichotomies—is not met at all. The criticism 
was of the method of forming an inference, and not as to whether the inference 
led to a law which could be otherwise substantiated. “There may be, and 
probably is, some decrease of association with advancing age,” wrote Heron, “but 
the enunciation of such a law on the basis of a number of coefficients of association 
is purely idle.” The truth of the law or its falsity is of no great importance, but 
that Mr Yule should reach it by a fallacious method is of fundamental importance. 
Mr Yule seeks by the words “Dr Heron also objects to my conclusion that 
association decreases with age”—an objection never raised—to confuse the really 
destructive criticism, that Q, unlike tetrachoric 7;, having no intelligible correction 
for the one-sidedness of its dichotomies is a function of the dichotomic percentages 
and therefore two Q’s based on different percentages are wholly incomparable. 





* Phil. Trans. Vol. 194 A, pp. 309 et seq. 
+ Biometrika, Vol. vut. p. 119. 
t Loc. cit. p. 637. 
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(c) Fallacies involved in the use of Percentages. Coefficient of Colligation. 


Mr Yule has endeavoured to give his Q a physical meaning by deducing from 
it for the “equivalent symmetrical table” his coefficient of colligation . For 
such a table 


| 

SH | aa 
| “ee a 
B ac Jad | it 4 
| Not-B... be | Jad | 


Totals | J/ad+ Je | faa Jbe 





_ Vad —Vbe ee eee of te ie es of not-A a) 
~ Vad+vVbe 100\\ which are B’s which are B’s : 

We have given grave reasons for doubting the process by which Mr Yule deduces 

this table from his original data. But this very method of percentages itself 

is liable to gross misinterpretation, and illustrations of this occur throughout 

Mr Yule’s text-book*. 


Given a fourfold table : 


| | A | Not-A 
| | | | 
a | | on 
B eral a b a+b 
REE AE, Bh eee, MEETS ‘e! 
Not-B... ( | d c+d | 
OE oe 
| | | | 
at+e | b+d N | 
| ‘yee pind 
th tage diff 100 : ) for the vertical treat t and 
ay 2 > Ss = enioeindittns > , € “ree ¢ 
e percentage difference is q, = (<- Pe e cal treatment an 
qrh= 100 (—* : :) for the horizontal treatment. Which is to be taken as a 
a+b c+d 


real measure of the relationship? Mr Yule in his text-book uses either and 

‘ 
apparently has some personal scale of values. He gives no probable errors which 
alone could give any soundness to his discussions. Actually we find 


bd 
cag ae 
PE. of gy =67°449 , (ate +d)” 


d 
.E. of = 67-449, / x 
P.E. of Gp arbt enay 


* Theory of Statistics, 1911. 
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Now here are a few of Mr Yule’s percentage coefficients : 


(1) Imbeciles and Deaf-mutes (Yule, loc. cit. pp. 33—34). 














| 

















‘pidbetile | Non-Imbecile 
Deaf-Mute 451 | 14,795 15,246 | qy=0°877 
Non-Deaf-Mute 48,431 | 32,464,32? 32,512,754 | g,=2°809 
Totals 48,882 | 32,479,118 | 32,528,000 
(2) Datura (Yule, loc. cit. p. 37). 
| Violet White | Totals 
Prickly 47 21 68 qu=- 7°840 
Smooth ... 12 3 15 r= — 10°882 
Totals 59 24 83 


(3) Houses 
p. 62). 


(4) Eye-colour in Father and Son (Yule, loc. 


(5) Developmental Defects and Dullness (Yule, loc. cit. p. 45, where the 














in course of erection in Urban and Rural Districts (Yule, loc. cit. 











; % 
Built | Building | Totals 
Urban 4960 | 50 5010 qu=6'714 
Rural 1749 12 1761 Gn=0°317 
| Totals 6709 | = 62 6771 | 














Father. 
| Light | Not-Light | Totals 
= | Light 471 148 619 Jv =36'570 
B | Not-Light 151 230 381 dn=36°457 
| Totals 622 | 378 1000 | 








numbers are reduced to 10,000). 





=33°528 
Gn=36'951 








With Defects Without Totals 
Dull 888 1186 g 
Not-Dull 1420 22793 2 
| Totals 2308 23979 26287 | 
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These instances will suffice, and now let us sum them in a table with their 
probable errors. 








| ——. Values of ¢ | — Mr Yule’s Judgment on the Table 
toe eee ee) ee tks rie ee oat ry * 
| (3), Qn 0°317 +0°163 “Distinct positive Association ” 
(1), YW» 0°877 +0°029 “ High degree of Association ” 
(1), Gn 2-809 +0-093 “ High degree of Association ” 
(3), Yn 6°714 +3°40 “ Distinct positive Association ” 
(2), YW 7°840 +5°77 “No Association ” 
(2), Gn 10°882 +7°93 “No Association ” | 
(5), Y 33°528 +0°68 “Very high indeed” 
(4) Th 36°457 +2°05 “‘ Shows the tendency to resemblance ” 
(4)s Ye 36°570 +2°05 “Shows the tendency to resemblance” | 
(5), Tn 36°951 +0°73 “Very high indeed” 


| 
| 
| 
| 


Now Mr Yule has used the method of percentages in a curious manner; some- 
times he compares a/(a+c) with b/(b+d) but at other times with (a+b)/N; 
sometimes he uses the percentages found both ways, sometimes only found one 
way. He has throughout failed to give the probable errors of the differences 
of the percentages, which might have influenced his judgment, but he leaves his 
readers to believe that some inference as to the intensity of association can be 
founded merely upon such relative percentage differeaces. He indeed tells us (Joc. 
cit. p. 651) to distinguish between the intensity of an association and the reliability 
of that intensity, so that we must presume that in speaking of the grade of the 
association, he does not form his judgment in relation to the probable error. 
Now in this table we find percentage differences of 7°8 and 109 belong to 
tables which in Mr Yule’s judgment exhibit no association, but tables with 
differences of 0°3 have “distinct positive association” and of 0°8 have “high degree 
of association.” One table with a difference of 36 merely shows the “tendency” 
to association ; another with the same percentage difference has association “ very 
high indeed.” For any given table there are six ways in which the difference of 
percentages can be enumerated, namely 








a b a ath b atb 
at+c b+d’ a+e N°" b+d Panes 
a ie a a+ec c at+ec 

a+b c+d’ a+b 1, ae c+d N 


Mr Yule sometimes uses one, sometimes another of these methods to reach 
his judgment of the degree of association in a table. He has given us his 
judgment with regard to the association of developmental defects and dullness 
for the partial universe of those without nerve signs (loc. cit. pp. 45—46), he says 
the association is “ very high indeed.” It may according to the percentage differ- 
ence chosen be either 2°92 or 36°96. He has further given us his judgment on the 
association of developmental defec:, and dullness for those with nerve signs, he 
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says it is “very small.” It may according to the percentage difference be 5°15 or 
0°96; thus the percentage difference might be higher than in the very “high 
association indeed” case in the partial universe of those without nerve signs. 


Finally in the “high association” case of the total universe, the percentage 
difference might be 1°91 or 51°03. Clearly in judging by percentages the con- 
clusion will depend on which percentage is worked out first. Is it not clear 
that, however Mr Yule may have reached his judgments of no, small, high or very 
high association, the percentage difference is not his actual measure and could 
only confuse the tyro in statistics, for whom he introduces this “simple” method ? 
But if difference of percentage has obviously no correlation with Mr Yule’s 
judgment of these grades of association, what weight can possibly be given to the 
coefficient of colligation in determining association? The chief merit of that 
coefficient according to Mr Yule is that it has—what Q has not—a physical 
meaning ; but for him percentage differences of 1°91 and 3696 alike mark “very 
high associations” and differences of 0°96 and 10°88 alike mark very small or no 
association. We have no standard and clearly Mr Yule has none of how such 
differences of percentages are to be interpreted. Mr Yule has by his treatment 
of percentages a priori destroyed any rational meaning that could be given to 
his own coefficient of colligation as a measure of relationship. Is not Professor 
Edgeworth’s question answered? The coefficient is a “colligation,” not a “profound 
truth.” Mr Yule obviously lays no consistent stress whatever on percentage 
differences in practice. 


(d) Mr Yule’s use of Pearson’s “Transfer” (ab—cd)/N as a 
Measure of Association. 


Among Mr Yule’s many means of testing association—no two of which give as 
a rule the same result—perhaps the most striking is his use of the “Transfer” 
(ab—cd)/N, to which he gives a new name and letter, the “common difference” 6*. 
One of us had already suggested that the “transfer” per unit of the total frequency 
might be used as a coefficient of association of an inadequate character+. It is in- 
adequate because it makes no correction for class-indices and none for the centroids 
of the quadrants. Thus it does not lie between 0 and 1, and is largely affected by 
the position of the dichotomic lines. Mr Yule has preferred to use instead of the 
transfer per unit of total frequency simply the transfer, his “common difference 
é,” and the results add further evidence of the vagueness of the whole of his 
conceptions of association. He tells us, to begin with, that “the difference of the 
cross-products may be very large if NV be large, although 6 is really very small... 
the difference should be compared with J, or it will be liable to suggest a higher 
degree of association than actually exists” (loc. cit. p. 37). To illustrate his 


* Theory of Statistics, p. 36. 
t Phil, Trans. Vol. 195 A, p. 14. 
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method he uses some of Bateson and Saunders’ data for Datura from the Report 
to the Evolution Committee, 1902 : 


- Colour of Flower. 














Violet | White Totals 
$)| Prickly... a | 68 
a Smooth ... 12 | 3 15 
mH 
Totals ... 59 | 24 83 | 

















The difference of the cross products is 252—141=111 and then Mr Yule 
proceeds to tell us that “at first sight this considerable difference is apt to 
suggest a considerable association.” He then divides by 83, and writes: “ But 
§=111/83=1° only, so that in point of fact the association is small, so small 
that no stress can be laid on it as indicating anything but a fluctuation of 
sampling” (p. 37). 


That Mr Yule is content with this process is evident from the opening words 
of the following paragraph: “While the methods used in the preceding pages 
suffice for most practical purposes, it is often very convenient to measure the 
intensities of association in different cases by means of some formula or ‘coefficient’.” 
We now know what Mr Yule considers “sufficient for most practical purposes”! 
Here are a few tables to illustrate it. 









































(i) 

LL 2«.. | oe 
B sak 266,374 | 233,626 500,000 

| Not-B... 233,626 | 266,374 500,000 | 
Totals... 500,000 | 500,000 1000,000 | 

(i) 

9 H f A a : Not-A Totals 
B e! 934,579 | 31,153 965,732 | 
Not-B ... 31,153 3,115 34,268 
Totals... 965,732 | 34,268 1000,000 

(iii) 

q Be Se ae ee ee E'S 

| B ~ 999,000 | 450 999,450 

| Not-B... 450 | 100 550 

| | 

| Totals... 999,450 550 1000,000 | 
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The values of Mr Yule’s 6 for these three tables are 
(i) 16374, (ii) 1940, (iii) 100. 

If anything is to be judged from these results, (i) is far more highly associated 
than (ii) and (ii) again than (iii), All are far more highly associated than the 
Datura, which we will call (iv), and then the order of association sinks from (i) to 
(iv) in a marked manner. 

Now here are Mr Yule’s coefficients of association put against his 6’s 

8: (i) 16374, (ii) 1940, (iii) 100, (iv) 1°34, 
Q: (i) ‘130, (iv) ‘282, (ii) *500, (iii) “996. 

As the one series goes down the other goes up! What, we ask, can be learnt 
from Mr Yule on the subject of association, when his methods, “sufficient for most 
practical purposes,” thus contradict themselves ? 

Here in (i), (ii) and (iii) we have kept the total frequency constant, but 
perhaps the most absurd side of Mr Yule’s 6 is manifest if we alter the total V 
of the observations. Suppose Bateson and Saunders had experimented with 
8300 plants instead of 83, then 6 would have been 134 instead of 134, The 
association is of course absolutely the same, but how would Mr Yule interpret 
his two 8s? 


We regret having to draw attention to the manner in which Mr Yule has 
gone astray at every stage in his treatment of association, but criticism of his 
methods has been thrust on us not only by Mr Yule’s recent attack, but also by 
the unthinking praise which has been bestowed on a text-book which at many 
points can only lead statistical students hopelessly astray. 


(e) Mr Yule’s Assumption as to Absurdities which must arise if 
Normal Distribution be applied to the “ Blind.” 


Another interesting fallacy is developed by Mr Yule on p. 638 of his paper. 
He writes: 


“Consider for a moment what the assumption of normality of distribution would imply in 
any case where there is an increase of, say, the blind from one age-group to the next. This 
must imply either (1) a fall in the mean of the assumed variable character, goodness of sight, 
I suppose—if the standard deviation is constant or falling, or (2) an increase of the standard 
deviation if the mean is constant or rising. If the first occurs, then there must be some people 
in the later age-group who are much more blind than any people in the first, and fewer people of 
first-class sight ; if the second, there must still be some people in the later group much blinder 
than any in the earlier, and there will also be some of much better sight. On the assumption 
that lies at the base of the normal coefficient, you cannot, in fact, effect a change in the 
numerical proportion of A’s without changing them qualitatively at the same time. The 
assumption seems to me absurd, to be equivalent in this case to saying that there are certain 
people entirely deprived of sight in the first age-group, and certain others more than entirely 
deprived of sight in the second. The normal coefficient is accordingly inapplicable, and its 
precise values of no special significance.” 


We have rarely come across a more specious fallacy. If it were true it would 
be impossible in practical statistics to represent both a population and a selected 
sub-population by normal curves. Let the original population be NV, mean M. 


? 
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and standard deviation =, and let the selected population be n, mean m, and 
standard deviation o. Then the second curve will always pass outside the 
first if es 
(m — M/ + 2 (3? — o*) log, (Wr) >0. 
No 

This will happen of course if 2}=o, Mr Yule’s first case. In this case the 

curves cut on one side only of the means in the point 
e 2 log, N — log, n 
&=4(m+M)+ ae o*. 

In the case of blindness we should have m>M*, and all it would signify 
would be that after this value, #, of badness of sight, the older age would have 
for each grade of bad sight more individuals than at the lesser age; since the 
two curves both extend to infinity there is no question, as Mr Yule suggests, of 
persons being “much more blind” than at the younger ages. Mr Yule is simply 
confusing in his own mind or in the minds of his readers two senses of “more 
blind,” i.e. more blind persons of each grade, and blind men of greater degree 
of blindness than actually can occur at the younger age. If he merely means 
to say that the Gaussian extends to infinity in both directions, that is a very 
old objection on the theoretical side to the curve; it has little value in practical 
statistics, where there is a reasonable approach to normality. 

If m= M, then we have 

/2>? {lo T) — log, (n=)} 
@=Mton/ =e een son! 
It is necessary therefore for real roots that o>, if oN be >nz. This is 
Mr Yule’s second case, and this would give increasing numbers of persons in 
each grade of good sight beyond the value of # from M for increasing old age. 
We should have thought that this was a very improbable state of affairs, as it 
is almost certain that sight deteriorates with age after childhood at least. If we 
take the 1891 census datat used by Mr Yule, we have: 





For ages : 45—55 55—C5 | 


| | 
j 
| 
| 


Males in general ... 1,191,789 770,124 | 


Blind males Sry 1,752 1,905 | ; 
If we suppose in these cases m = M, we find X/= = 2974 and a/o = 2°810, and 
since the dichotomy at “blind” is the same for both curves, we must have X =a, 

* We have taken positive axis towards worse sight. 

+ Vol. mz. pp. v. and lvii. Mr Yule has clubbed together those blind from childhood and the 
numbers, 12 times as great for these years, not blind from childhood. That so many acquire blindness 
indicates what a range of graduated sight there must be unless we suppose blindness to arise in- 
stantaneously. 

Biometrika 1x 38 
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or ¢=1°058>. Thus a six per cent. increase in o suffices to provide the increased 
blind population at the greater age ; the two curves intersect at «= + 2°9686, or 
somewhat beyond the “blind” boundary. For all grades beyond this there will 
be more persons of each grade of bad sight. There is nothing inconceivable or 
improbable in this; but, on the other hand, there would be beyond # =— 2‘968c, 
on the good sight side, a number of grades of better sight with more people at the 
older age. This, of course, is not impossible, but it seems far more reasonable to 
suppose the average sight to grow worse with old age, and in addition to change 
its variability somewhat. If the variability remained the same, as we have seen 
under the first case, there is no excess in the grades of marked good sight in the 
population of older ages. Let us now consider what happens if the mean be 
shifted and the rn increased. Taking m>M*, we have 


#—m = - z Ds pe aoa > oN cia m-M 
ee? =+ ot —- 3? loge =, a ie 


Now let us take c= ieee 1 and therefore for the “blind” groups at 45-55 and 
55-65 we have as before X = 2°9740%, # = 2°81050 = 2'8386 to fix the dichotomy. 
Hence 

(m — M)/> =(X —2)/> = "13538. 
Thus we find (@ — m)/o = + 2°6738 or — 16°2791. 


The former value shows that from some little distance outside the dichotomic 
line each grade of bad sight and blindness has more individuals of that grade at 
ages 55-65 than at ages 45-55. The latter value indicates that the point of 
intersection of the sight curves for the two ages on the side of good sight takes 
place at a point so extremely distant from the average sight that not a single 
individual would occur with such sight in a population many thousand times 
greater than the actual population. 


We think it most probable, however, that a third case, not even referred to by 
Mr Yule, best describes what actually takes place—namely, that the sight at the 
older age gets worse and is less variable, not more variable. To illustrate this, 
take o=‘99, then X = 2:97403, «= 2'81050 = 27824. Hence 

(m — M)/X% ='19158, and (#—m)/o = + 2°4742 or + 165875. 

Thus the older age curve now never cuts the younger age curve of sight on the 
side of good sight at all. It cuts on the side of bad sight twice, once somewhat on 
the good sight side of the “blind” dichotomic line, and on the other occasion 
immensely beyond the limits of the populations in question. In other words, 
the older ages have fewer members in each grade of good sight and more members 
in each grade of very bad sight. This appears to us a perfectly reasonable state 
of affairs, and of course extends far beyond the ratio.selected for o/%. 

It will thus, we think, be clear that had Mr Yule attempted to turn his 
half-baked notions into figures before he expressed them in words, he would 


* The positive direction of the variate is towards bad sight. 
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have realised that more people in the more blind grades or more people in the 
better grades of sight are not the same thing as some people “ much more blind” 
or some people of “much better sight” than any in the earlier aged groups. 
Mr Yule’s alternatives are not real alternatives, he makes no reference to a shift 
in mean and a decrease in variability; such a combination involves only a reduction 
of the numbers in the grades of good sight, a very reasonable result with increasing 
age, and an increase of the numbers in the grades of bad sight, also a very 
reasonable hypothesis. Towards the ‘tails’ of both age curves, theoretically 
there would be fractional units, while in actual observations there would be 
isolated units at relatively wide intervals (ci. Galton’s “ Difference Problem * ”). 
What the distribution of such units might be, could not be a priori predicted. 
But it is quite possible for two distributions with slightly separated means and 
slightly different variabilities to give quite reasonable fits to Gaussian curves 
and yet the distribution with the greater variability to have no outlying units 
with “much more” of a character than any which exist in the less variable 
distribution. The variabilities are much more closely determined by the bulk 
of cases with moderately large deviations than by the one or two extreme 
outlying individuals. 

In the case we have last discussed the age-group 45-55 has two individuals 
lying outside 4°652, and the distance between the means being ‘19¥ this gives 
4°46 =45lo for the corresponding distance on the 55-65 age curve. Outside 
this limit are 2°5 individuals of this older age curve. Are we to say that that 
half individual represents Mr Yule’s necessity for some people in the later age- 
group who are “much more blind” than any people in the first? In truth some 
people would be much less blind, if they would only stay to express their opinions 
in actual numbers before writing them down. The “ minute sifting of numerical 
results” is the foundation of all true statistical inference, and here, as in other 
phases of his recent work, Mr Yule has committed himself to superficial statements 
reached by verbal disquisition which vanishes into nothingness if the touchstone 
of numerical investigation be applied to it. 

There are many other points at which we should like to traverse Mr Yule’s 
statements, but we think we have brought forward enough evidence to indicate 
how unreliable are his methods and how biased are his criticisms. 


(16) Summary of General Conclusions. 
In order to sum up the general conclusions reached in this paper, we must 
state first one or two principles which we accept as almost axiomatic: 


(i) There is no universal method of dealing with an n x n-fold table, except 
the method of mean square contingency, leading to a probability measure of the 
independence of the two characters, unless we know: 


* Biometrika, Vol. 1. p. 390. 
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(a) that both characters proceed by discrete units, and are tabled as such. 


In this case the method of pseudo-ranks is identical with that of the product- 
moment, and there never has been doubt as to how the table is to be treated; or: 


(b) that the frequency of the two characters is continuous and that this 
freque icy follows or approximates to a definite theoretical system. There is only 
one such frequency system, which has up to the present been effectively discussed, 
ie. that of Laplace, or as it is more frequenily but less justly called, that of 
Gauss*. If the distribution be Gaussian or approximately Gaussian, there are 
many ways of dealing with an n x n-fold table. 


(ii) A majority of the cases which occur in statistical practice are so close 
to the Gaussian distribution that methods based upon Gaussian theory will give 
useful first approximations, i.e. correlations within + ‘05 say of the true values. 


Years ago one of the present writers insisted on the non-Gaussian character 
of many variables. But he also remarked on the large number of variables 
which can be described with sufficient practical accuracy by a Gaussian 
distribution. 


The present discussion demonstrates that even with distributions markedly 
skew the Gaussian theory, if applied to 2x 2-fold tables—without extreme 
dichotomies—will give results not differing by more than ‘05 from the value of 
the true correlation and often differing by much less. Roughly, we may say 
that for reasonable divisions, the divergence between the true correlation and 
that obtained by Gaussian theory is hardly ever of practical importance and indeed 
in “populations” of the size usually dealt with rarely exceeds twice the probable 
error. 


(iii) The coefficient of correlation has such valuable and definite physical 
meanings that if it can be obtained for any material, even approximately, it is 
worth immensely more than any arbitrary coefficients of “association” and 
“ colligation.” 

Starting from these principles we ask ourselves to what data Mr Yule proposes 
to apply his three processes ; 

(a) The Boas-Yulean ¢ for fourfold tables. 
(b) The coefficient of pseudo-ranks. 
(c) The coefficient of association or that of colligation. 

We have shown in this paper that for tables with a finite number of cells 
of the order 5 x 5-fold to 8 x 8-fuld, the method of pseudo-ranks must lead to a 
value below and often 40 °/, below the true correlation of variates. Mr Yule has 
stumbled into a statistical pitfall, for he has neglected the fact that correlation of 
ranks is not correlation of variates, and that his correlation of ranks would still 
have to be corrected for the class-index correlations, ie. he has also neglected the 


* Of course both these writers only dealt with the frequency of one variate; Bravais extended it to 
two, but gives no admissible proof of his formula, which he practically gets by analogy. 
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existence of huge brackets. We may, we hold, entirely dismiss from statistical 
practice this method of pseudo-ranks, except for the case wherein it has always 
been used, i.e. for discrete unit variates, classified by units, where it coincides with 
the usual product-moment method and needs no special name. 


We have only to note here that Mr Yule uses the method of pseudo-ranks, 
which we hold to be demonstrably false, to make very sweeping charges, which 
can be and have been met, against the pigmentation work of the Biometric School. 
He not only suggests that the workers on the pigmentation data were foolish, but 
that they were dishonest. That is the sort of attack which usually recoils on the 
head of the man who makes it, especially when he has for several years worked 
in the Department against which he prefers the accusation. As a matter of fact 
the non-Gaussian character, the variability of tetrachoric 7, for different divisions 
was recognised very soon after it had been applied. But the investigations then 
made and more amply illustrated in this memoir indicate that the values originally 
given were substantially correct, the inheritance of intensity of pigmentation 
between parent and offspring lies between ‘46 and ‘50; it is not of the order 4 
as Mr Yule asserts on the basis of a theory which we feel convinced he will have 
to withdraw, if he wishes to maintain any reputation as a statistician. 


We have shown in the course of this memoir that the coefficient of asso- 
ciation Q, if treated merely as an undefined measure of association, has not 
for varying dichotomies the stability of the tetrachoric coefficient and it appears 
to have no reasonable physical meaning even for the cases which he has selected. 
Mr Yule has deduced from it a second coefficient, that of colligation, which has, 
he says, a physical meaning, when the table has been dressed in an artificial 
manner, namely it signifies on this artificial table the excess of the percentage 
of A’s that are also B’s over not-A’s that are also B’s. We have shown from 
Mr Yule’s own writings that such a difference of percentages has in his own 
practice no meaning at all from the standpoint of association. 


Mr Yule never tells us clearly when we are to use one or other of his co- 
efficients. He spends 16 pages of his memoir on discussing the application of 
his coefficients of colligation and association to the vaccination data; yet on p. 611 
he writes: “For discontinuous attributes—attributes proper, as we might term 
them—the true correlation is that given by formula (24) or (26) [i.e. the Boas- 
Yulean coefficient or Pearson’s ¢]; we are dealing with a variable in fact, whieh 
can only take two values as distinct from a variable exhibiting a normal or any 
other continuous distribution. Tables I, III and IV [i.e. the vaccination data], 
as it seems to me, represent precisely such a case.” Here Mr Yule has given up 
colligation as applied to vaccination ; if so why devote 16 pages to its discussion ? 
But 20 pages later Mr Yule tells us that: “For investigations on smallpox and 
vaccination such as those of Brownlee and Macdonell and Turner, the use of Q or 
would, in my opinion, have been more illuminating as well as simpler than the 
use of the normal coefficient” (p. 631). 
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Mr Yule does not really appear to know which of his ducklings to prefer, 
even for what in his estimation—although not in ours—are discrete attributes. 
We are quite clear that none of them are appropriate. Still less are they 
appropriate for the definitely continuous data to which Mr Yule and his disciples 
apply them. 

The controversy between us is much more important than an idle reader will 
at once comprehend. It is the old controversy of nominalism against realism. 
Mr Yule is juggling with class-names as if they represented real entities, and his 
statistics are only a form of symbolic logic. No knowledge of a practical kind 
ever came out of these logical theories. As exercises for students of logic they 
may be of educational value, but great harm will arise to modern statistical 
practice, if Mr Yule’s methods of treating all individuals under a class-index as 
identities become widespread, and there is grave danger of such a result, for his 
path is easy to follow and most men shirk the arduous. 

The very large amount of arithmetic involved in this paper would have been 
impossible without friendly help from a number of our colleagues; we have 
especially to thank Miss Julia Bell and Mr Herbert E. Soper; the former for 
much work in calculating and the latter for diagram draughtsmanship as well as 
calculation. Miss Ethel M. Elderton also most kindly undertook one or two 
pieces of heavy arithmetic. We can hardly hope to have escaped numerical slips, 
but we feel confident that such slips, if they occur, will as frequently tell against 
us as for us; and we have not, knowing the fallibility of the best calculators, 
done more than draw attention to points where our arithmetic differs from that 


of Mr Yule. We have laid sole stress on errors of interpretation and on fallacious 
theory. 


APPENDIX I. 


ON THE FALLACY OF ASSERTING PERFECT ASSOCIATION WHEN ONE 
QUADRANT IN A FourFOLD TABLE IS VACANT. 


In all the values of the probable errors hitherto determined the constants 
of the distribution found in the formula are truly constants of the actual 
distribution of which the population under discussion is a sample. Because we 
do not know the actual distribution, we replace its constants by those of the 
observed sample. This method will as a rule not lead us astray, but it may do 
so grievously in cases where the observed frequencies take limiting values, For 
example, consider the population described in the following fourfold table: 











TABLE I. 

‘a ¥ Not-A | Totals 
| ss | 971,138 | 22,862 994,000 
| Nobo... 5,862 | 138 6,000 
| Totals 977,000 | 23,000 1000,000 
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and let this distribution be the real population or an absolutely proportional sample. 
Now for this population with the scheme a we have ad — be =0, and all our 


measures of association vanish. If we take a sample of n’ from such a population 
of n, the value » =(ad — bc)/n? will not for the sample be zero, but, if 

n =(a'd’ — b’c’)/n®, 
where a’, d’, b’ and c’ are the values in the sample, will have a standard 
deviation * 





shes khaled 


nt Vn’ 


e = foe ee 1 
_= sdb aah ed d 
Now let us compare this with 


a M4 +b’) (a +’) (d' +0’) (d' +c) : 








n's Vn’ 
the value of o, derived from the sample itself. Now to units the value of the 
sample might be 











TABLE II. 
| ee A Not-A ti Totals c 
 B os 971 23 994 
Not-B... 6 | 6 ; 
Totals 977 | “ ease | 














and it is clear that o, and o, will be exactly the same, as they depend only on 
the marginal totals. Thus 





o, =o, = me V:994 x ‘977 x 006 x 023 
1000 


= 000366. 
But 7 = °000138. Accordingly (ad — bc)/N* when d = 0 is not significant, having 
regard to its probable error, and the association is zero. On Mr Yule’s theory 
Q=-1 and its probable error is zero. 


Now the standard deviation of Q, for Q zero, is 
vn’ V ma. << ee 
for a sample of n’, where a, b, c, d refer to the original population. 


i* /(ato)(o+ d)(b +d) (a +b) 3 


Let us consider the differences for the above material which will arise from 
calculating yog on the population in Table I and on the sample in Table II. On 
Table I we find .cg=137; on Table II yog’ = 262. Hence either Table is 


* Pearson, ‘‘On a novel method of Regarding the Association of two Variates,” Drapers’ Research 
Memoirs, Biometric Series, No. vu, Dulau & Co., 1912, p. 7. 
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absolutely compatible with Q being zero. It is true that the value of og as 
estimated from the sample is nearly double what its value would be truly esti- 
mated from the population, but both values suffice to show that Q=0 is as 
reasonable an hypothesis as to the constitution of the material as Q=—1, and 
far more reasonable than Q=—1+0. The method of mean square contingency 
gives us x*='1455, leading by Palin Elderton’s Tables to P =-971, or if the 
material were truly independent only in three cases per hundred should we get 
a better fit than Table II to Table I in taking samples of 1000. 


The fact is that in drawing random samples of 1000 from Table I, the 
distribution of frequency in the cell d is given by 


(‘999862 + 000138)" = 8712 + ‘1202 + 0086 (for all terms beyond second). 


Hence in 100 samples of Table I, d would be zero in 87 cases, unity in 12 and 
greater than unity in about one case. In other words Mr Yule’s Association 
Coefficient would for material with true zero association be —1 in about 87 °/, 
of cases and of the order +°75 in 12°/, other cases. In all these cases, however, 
the probability method shows practical independence. It must accordingly be 
recognised that it is extremely dangerous, if zero frequency be found in one 
quadrant, to assert that Q=+1+0. For, a population of zero association would 
give such a value in 87 °/, of samples of 1000 in a case like that under consideration. 
The tetrachoric method fails also for this extreme case, for the simple reason 
that the very continuity of the method excludes thinking in isolated units. 
If the real population were that of Table I, then on the basis of continuity we 
should expect ‘138 from the infinite skirt of the Gaussian surface in quadrant d. 
But we might extend this volume of the skirt up to something under ‘5 before we 
should anticipate a whole unit to appear in d. There is no suggestion of this 
kind about Q, for Mr Yule directly discards all conception of such continuous 
frequency surfaces, This point must be borne in mind in applying tetrachoric 7; ; 
a quadrant of zero frequency does not necessarily signify that in the theoretical 
Gaussian distribution this quadrant would have zero frequency. We may equally 
reasonably assume it anything up to ‘5. In such a distribution as 


TABLE III. 




















A | Not-A Totals 
1 
Baas Re 21 | 450 471 
Not-B... 529 | 0 529 
Totals 550 450 1000 


we shall alter the tetrachoric coefficient from negative unity to a high value 
somewhat less than negative unity, by inserting anything up to ‘5 in quadrant d, 
but we shall not swing the correlation from —1 through 0 to a small positive 
value by the process. We shall get a reasonable minimum value for 7, and we 
shall be convinced that 7, =—1 + 0 is not necessarily a true representation of the 
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state of affairs, although the correlation is negatively high. But in such a table 
as Table JI the whole character of the correlation is changed by inserting a 
frequency «, less than ‘5, in this d quadrant. 


The first question to be answered is: If we put a small frequency a in d, how 
are we to take it from the other quadrants? The only reasonable answer here 
seems to be: “Take it in proportion to their frequencies.” Thus Table II 
becomes of the form: 


TABLE IV. 


























977(1 — =) are are 
977 (1 1000 | orm! = oe _ 


Now can we choose « so that ab —cd =0, for this Table? For this we must 


have: 
971 (1-759) x= 23 (1- i500) x 6(1— 595) 
which leads to # = "1421. 

This shows us that on the Gaussian hypothesis a value of « absolutely 
consistent with a zero being recorded in that quadrant leads to zero association. 
This for most practical purposes would be sufficient. But a little further con- 
sideration here is desirable; we may write the table in the form: 


TABLE V. 


970°8620—2 | 22:9967+2 993°8587 
599924a | 1421 —2 | 61413 : 
aS < 
9768612 23-1388 | 1000 


Here «=0 gives a table of zero association, « =*1421 gives, if we can only 
record to a unit individual, the table of experience, i.e. 


TABLE VI. 


‘ 








971 23 994 
6 0 6 
977 23 1000 





Now 7 of Table V =— s000" Therefore the probable error of «= 1000 times 


that of . If we find yo, for «=0 in Table V, we have for its value 0003714, 


Biometrika 1x 39 
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quite close to the value we obtained for Table I from which we originally deduced 
Table II. Hence the probable error of « =°67449 x 1000 x oo, = '2505. It will 
be clear therefore that with such a probable error for x, the frequency of the 
d quadrant might actually be anything from 0 to ‘5, and yet the Table of 
experience takes the form VI, when we can record only by units. 

Suppose d had been ‘5, # = — ‘3579, then we have for the theoretical table : 





TABLE VII. 
971-2199 | 226388 | 993-8587 
56413 | ‘5000 | = 61413 
976-8612 231388 | 1000 


which solved by the tetrachoric process gives 
1, = + ‘224 + 187, 
i.e. the value of *; is not significant*. 

On the other hand if we put d=0, ie. deal with Table VI, tetrachoric 7, is 
unity, because on the Gaussian hypothesis only complete association is compatible 
with absolute zero in this quadrant and then any sample will exhibit absolute 
correlation. But, as we have just seen, the zero in quadrant d could arise from 
samples of a population in which d was small but not zero, and that in a particular 
case 87°/, of samples of a material with zero association would show this zero 
quadrant. The accompanying diagram gives the values of tetrachoric ~ for 
various hypotheses as to. The dotted rectangle is bounded by the lines which 
give vertically twice the probable error of « for z =0, and horizontally twice the 
probable error of 7; for true 7,=0. We have also placed once the probable error of 
tetrachoric 7; from the plotted 7, on either side giving the broken curve. It will 
be seen that from the case of d='5, 7, =+'2244+°187 up to d=:0025 and 
r, = —*400 + ‘779, there is no significance in the values of x, found by the tetra- 
choric process. Even after this we cannot assert that the values of 7, obtained 
would be significant+; for the proof of the formula of the probable error of 7, 
depends upon Sd being small as compared to d. Now og=¥Vd (1 —d/N)=V4d, 
very nearly when d is very small. Hence 8d/d is of the order Vd/d = 1/Vd which 
will be large if d be less than unity{. The probable error of 7; = 1 is only zero if 
d is absolutely zero for the population which is being sampled and not if it is 
merely zero in the sample. But we only know that population through the 
sample, and we see that in such cases as we are considering the zero in the 
sampled population is only likely to occur in a small proportion of the cases dealt 
with. The tetrachoric process clearly fails in such cases, but we see that with 


* If we apply Q we have Q= ‘584+ ‘332, pointing rather more in the direction of significance, but 
such application of Q is illegitimate, as the fractionising of the theoretical surface has no meaning for a 
coefficient which is based on complete neglect of the nature of the frequency-distribution. 

+ In fact the tetrachoric r; series-equation rapidly becomes divergent and the formula for the 
probable error takes an indeterminate form. 

+ The like failure occurs in Mr Yule’s proof for the probable error of Q, although he has not 
warned his readers of this; it is accordingly not applicable, if one quadrant has zero frequency. 
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any reasonable assumption with regard to «, the tetrachoric process suggests not 
perfect but zero association ; the perfect association is only reached as a limiting 
case, although Mr Yule’s coefficient gives an unhesitating unity*. 
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* In the Table selected by Mr Yule to exhibit the variations of his coefficient of association (Phil. 
Trans. Vol. 194 A, p. 277) 51 out of the 164 values shown are +1 owing to the occurrence of a zero 
With the introduction of a ‘5 into this quadrant—quite reasonable on the basis of a con- 
tinuous variate—his values may be reduced from +1 to +°72 or swing over from —1 to +°98, etc., 
ete. ! 


quadrant. 


What is the value in such cases of the statement that there is “ perfect association”? 


39—2 
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We have seen that if the sampled population be really one of zero association, 
the sample may have in quadrant d in 87 °/, of cases zero frequency, but in 12°/, 
of cases unit frequency. This unit must—working to unit individuals—be with- 
drawn from a,b andc. The most probable case is the addition of a unit to both 
aand d and their withdrawal from b and c, or again we may draw units respectively 
from a, b or c. We may consider only these four cases: 


























| | 
(a) 972 | 22 | 994 (8) o71 | 22 993 
5 1 6 6 1 7 
977 | 23 | 1000 977 | 23 | 1000 
(y) 971 | 23 | 994 (8) 970 | 23 993 
5 3a 6 = ee 
976 24 | 1000 976 24 1000 





These lead to 











| a B | y 8 
——_$$——— ———___—— ——— —— a | —_ a —_— ——- —_—__— 
Ase 
Value of tetrachoric 7, ... , | ‘394°15 | *3B64°16 | -38+°15 | 354-16 | 
p.E. of 7,=0 for same marginal tote als.. = 00+°27 | 00+°25 | 00+°27 | 00 + °25 | 
Value of Association Q . | 804°14 | ‘764°16 | *78+°13 | -75+°16 
p.E. of Q=0 for same margin: al “tote ls 00+°23 | 004+°24 -00+°23 | 00+°25 





It will be seen that 7, is always less than 2°5 and Q always greater than 
4°5 times the probable error. The value of 7; obtained is always less than 
15 times the probable error of 7, =0 for the same marginal totals, while Q is 
3 times the probable error of Q=0 for the same marginal totals. Thus the 
tetrachoric method warns us that the association is probably zero, while the 
association coefficient emphasises a high value of the association. 


To sum up: The correct process in these cases with zero in one quadrant 
is not to assert that the association is perfect with Mr Yule, but to apply first the 
probability test and determine whether the material may not rather be a random 
sample from an original population of zero association. The failure of the tetra- 
choric 7; in these cases is rendered evident in the working, we reach non-convergent 
series and are thrown back on a limiting case; if we place in the zero quadrant a 
small frequency less than 0°5, which would correspond to zero in the actual table, 
we find a finite value of r;, but one non-significant having reference to its probable 
error, unless we approach close to the limit of d= 0, in which case the probable 
error of r, is so far undetermined, because the ordinary process fails to be valid. 


Mr Yule’s view that association is perfect when there is zero in any quadrant 
ignores the fact that he can only deal with a sample of the true population, and 
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so far as we are aware he has not warned students of the extreme danger of his Q 
in these cases, The non-applicability of tetrachoric 7; to cases in which one of the 
quadrants has zero or small relative frequency is, on the above and on other 
grounds, well-known to those who habitually work with it, and the rule has long 
been to avoid where possible all such extreme dichotomies. This point has been 
largely disregarded by Mr Yule in his criticism of tetrachoric methods. 


APPENDIX II. 


On THE TEST OF GOODNESS OF FIT OF OBSERVATION TO THEORY 
IN MENDELIAN EXPERIMENTS. 


An objection to the y*, P, test of goodness of fit, has recently been raised with 
somewhat unconscious humour by certain ardent Mendelians. A simvile illustration 
may be taken. Suppose the mating to be 


(DR) x (DD) =50°/, (DR) +50°/, (DD). 


For example, let there be 1000 offspring and let 480 of these be recognised by 
later experiment as (DR)’s and 520 as (DD). Then the standard deviation is 


71000 x} x 4=15°8 


and the observed deviation is 20, and P=about ‘90, or the fit is quite good. 
But suppose the observations are 


480 (DR)’s 519(DDys and 1 (RR) 
i.e. Observation: 480 519 1 
Theory : 500 500 0 


wis g (observation — theory)’ _ m 


Clearly x theany 


and P = 0, or the probability of observation being a random deviation from theory 
is zero, there is absolute badness of fit. Hence either theory or observation is at 
fault. It is not, however, the theory of “goodness of fit” which fails in such a 
case, but the Mendelian theory which wants mending. If we put a black balls, 
b white balls and ¢ red balls in a bag, the theory will tell us whether any sample 
of black, white and red balls can be reasonably considered as a random extract 
from that bag. But if we are presented with a series of black, white, red, and 
green balls and asked what is the probability that these were drawn from that bag, 
we must assert that it is zero, however concordant with theory the results for the 


black, white and red balls alone may be. There are only two courses open to the 
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experimenter who gets green balls however few in number out of a black, white 
and red ball theory : 


(i) to assert that they are anomalies due to his temporary colour-blindness, 
to mistakes in his observations, or to some misinterpretation of his results, which 
are not accurately describable in terms of his original categories, or 


(ii) to amend his original theory by inserting some green balls a posteriori 
into the bag and starting afresh to calculate his probabilities. 


The one course weakens the weight we must lay on his record or on his choice 
of categories; the other tends to discredit his a priori theory. To adopt a third 
course and assert that we want a new test for “goodness of fit” of theory to 
observation, which shall cover such discrepancies, i.e. which shall slur over 
divergencies between a priori theory and a posteriori results, may appeal to 
our sense of human fallibility but scarcely to our appreciation of scientific logic. 
We shall be left with the suspicion that the theory is plastic and the observations 
elastic. What criterion of “goodness of fit” can the theory of probability provide 
when it is a case of applying plastic theory to elastic observations? The answer 
surely is none whatever until the plasticity of the theory has been quantitatively 
studied, and until the errors of the record have been quantitatively stated. Either 
we must be told that the observer will mistake a red ball for a green one in 
so many per cent. of cases, or we must be told that the theory will be inaccurate 
in so many per cent. of cases. Personally we think it possible that all attempts to 
find a “ good fit” of a plastic theory to elastic observations are idle. It is a con- 
sideration of the green balls, which are said not to be in the black, red and white bag 
at all, which is often the basis of marked scientific progress. That atmospheric 
nitrogen differed from pure nitrogen was just such a “green” ball; but the plastic 
theory that air consisted of oxygen and nitrogen only had been confirmed by many 
elastic observations before Lord Rayleigh followed up bis “ green ” ball. 

These remarks are suggested by the following paragraphs in two recent 
Mendelian publications, Dr Raymond Pearl writes* concerning Mendelian 
data : 

“A determination might be made of the ‘goodness of fit’ of theory to 
observation by Pearson’s method, were it not for the fact that that method 
cannot be applied to cases like the present.” 

Dr Pearl says it cannot be applied because he finds green balls, where his 
theory puts only black, white and red into his bag. It is either his observation 
record or his Mendelian theory, not the mathematics of “goodness of fit,” which 
needs modification. Dr Pearl continues in a footnote as follows: 


“The difficulty lies in the fact that Pearson’s test depends upon a variable 


=8 = ~ mF 


mM, 


* The Journal of Experimental Zoology, Vol. xm. p. 203. 
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where m, is the theoretical frequency and m,’ the observed. Now obviously in 
any distribution where even one m, is zero, the value of x? must be infinite 
whatever may be the values of the other m,’s or m,’s. That is, if the theoretically 
expected frequency on any base element is numerically zero, the probability against 
the whole curve becomes infinite. Thus, for example, suppose a system of 


frequencies like the following, a type which is continually arising in Mendelian 
work ; 


Class 1 2 3 4 5 
Theoretically expected frequency ... 595 827 68 0 96 
Actually observed frequency oe 594 828 67 1 96 


“ Now it does not need a mathematical measure of any kind to tell one that in 
this case the theoretical and actual distributions are in very close agreement. 
Yet because the theoretical frequency on class 4 is zero, the probability by 
Pearson’s test is literally infinite against the observed distribution being regarded 
as a random sample of a population distributed in accordance with the theoretical 
frequencies. Pearson has indeed himself noted what is essentially this same 
difficulty in using the test on ordinary frequency distributions*.” 


Now what does this paragraph exactly signify ? Interpret it in coloured balls; 
white, red, black and yellow balls are placed in a bag in large numbers in the 
proportions of 595 : 827: 68:96. There are no green balls in the bag. One 
such green ball is said to have been drawn. Theory says it is an impossibility, 
and the criterion of goodness of fit says its improbability of occurrence is infinite. 
We can conceive no logical theory doing anything else. Dr Pearl does in fact tell 
us that cases in which a ball, classified as green, comes out of the theoretical 
white, red, black and yellow ball bag are of “a type which is continually arising in 
Mendelian work.” This at any rate is a frank admission. As a matter of fact 
with his arbitrary division between “over 30-egg” and “under 30-egg” hens, 
we are not surprised that “green” balls appeared not only in ones, but in twos 
and even in fours, and in a few cases to even more, although this is attributed to 
“ physiologically extremely favourable” matings (loc. cit. p. 248) as apart from 
gametic theoryt. What Dr Pearl is seeking is a plastic theory or an elastic 
record, not a real criterion of goodness of fit, which must give no finite proba- 
bility when green balls come from a bag which contains no green balls! Dr Pearl 
continues : 


“The point noted obviously limits greatly the applicability of Pearson’s test, 
and in a most unfortunate direction. Tests of goodness of fit are much needed in 
Mendelian work [we cordially agree !]. But it is just here that the classes where 


* This is a complete misunderstanding. Pearson says that you must not in the case of continuous 
variation make use of classes which theoretically have each less than unit frequency, where the record 
goes by individual units only. 

+ Until Dr Pearl publishes the actual record of each bird, and not merely its class-index, and the 
same for its ancestors, it is impossible to estimate his degree of justification for the theoretical 
treatment of his results. 
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the theoretical frequency is zero often occur*.” In other words, observation 
records something as occurring which existing theory says cannot occur, and 
Dr Pearl asks for a criterion which shall make the impossible only mildly 
improbable. He must either remould his theory or explain away his observations. 
We see no alternative. 


Dr Pearl’s demand for a criterion which shall not be crucial, but allow elastic 
records to fit a plastic theory, is well illustrated by the following paragraph from 
the work of another American Mendelian who is convinced that “feeble-mindedness” 
is a Mendelian unit character, but has found his “green ball” in the normal 
offspring of two feeble-minded parents: 


“These two are apparent exceptions to the law that two feeble-minded parents 
do not have anything but feeble-minded childrent. We may account for these 
two exceptions in one of several ways. Either there is a mistake in calling them 
normal, or a mistake in calling the parents feeble-minded; or else there was 
illegitimacy somewhere and these two children did not have the same father as 
the others of the family. Or we may turn to the Mendelian law and we discover 


* The remainder of Dr Pearl’s paragraph runs: ‘‘ To determine the probable error of the individual 
frequency in measuring the goodness of fit of Mendelian observation and theory, as was first practised 
by Weldon, and later by Johannsen and by Mendelian workers generally, does not appear to the writer 
to be an altogether sound procedure. It fails to take account of the correlations in errors amongst the 
several frequencies. Yet these are just as important and just as certainly existent in a Mendelian 
‘category’ type of distribution as in the ordinary variation polygon of a continuously variable 
character....Pearson’s test covers this point, and were it not for the other difficulty noted above 
would be much more widely useful in Mendelian work than is actually the case” (loc. cit. p. 204). 
The ‘‘ widely useful” test in Mendelian work is quite obviously one which will overlook negation of 
theory, or not drive the observer back to question the validity of his records or his categories. But 
there is a misstatement in the above sentence which needs correction. If there be only alternative 
categories, e.g. the total of (RR)’s compared with the total of (DR)’s+(DD)’s, then Pearson’s test is 
absolutely identical with the probable-error test. This is of course well recognised ; for, if m; and m2 be 
the observed frequencies and n, and ng the theoretical frequencies, 


m,—14)2 (mg — MN)? ae i | 1 
a=! + {ms — m2) = (m4 — 1)? (~ +— 
ny Ng Ny Ng 


9 9 
my — 4)? Mg — No)? 
ee ne , where N=n, +n, 


ny ( -¥) ng (1-9) 


_ (Deviation of either category)? 


(Standard Deviation)? A 











and for this case P= = | e~ 3x” dx. This test therefore Weldon applied with perfect legitimacy 
wJXx 


to the consideration of Mendelian quarters. When Weldon came in the very paper cited by Dr Pearl to 
test more complex Mendelian results, he did not fail to take account of correlations in errors, and 
actually applied Pearson’s criterion (Biometrika, Vol. 1. p. 235). Dr Pearl’s sentence therefore requires 
remodelling ; he has clearly failed to appreciate what Weldon was doing. 

+ The pedigrees published by Weekes and Goddard show other exceptions to Mendelian law, which 
they appear to have overlooked. This is confirmed by similar pedigrees in our Laboratory. The very 
idea that the continuous and highly variable character ‘‘feeble-mindedness” is a ‘‘unit character” 
in the Mendelian sense will do much to check real research into the grave complexities and difficulties 
of this very vague and broad category. 
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that according to that law there might be in rare instances such a combination of 
circumstances that a normal child might be born from two parents that function as 
feeble-minded. For practical purposes it is, of course, pretty clear that it is safe 
to assume that two feeble-minded parents will never have anything but feeble- 
minded children *.” 


The italics are ours. They are very typical of the manner in which an elastic 
record and a plastic theory are made to fit. No account is anywhere provided 
of this extension of Mendelian theory which “in rare instances” provides a normal 
offspring to the two parents with an abnormal dominant character. Possibly, as 
in Dr Pearl’s case, it is due to the occurrence of “physiologically extremely 
favourable” matings. Anyhow the old definite simplicity of Mendel’s Mendelism 
has gone; with an elastic record and a plastic theory any data may be Mendelian 
—or not—according to the views of the investigator who moulds his theory and 
stretches his facts. 


How welcome to such an one must be the Yulean theory of association ! 
“ Whatever the nature of the classification, however, natural or artificial, definite or 
uncertain, the final judgment must be decisive; any one object or individual must 
be held either to possess the given attribute or not” (Yule, Theory of Statistics, 
p- 9). 

In the face of such a direction, how could Dr Pearl have been so foolish as to 
balance a number of his hens on the dichotomic fence of 30-eggs, and allow a 
moiety of each such hen to possess one Mendelian attribute and the other moiety 
its alternative ? 

It is not difficult to understand, however, why Dr Pearl does not like Pearson’s 
criterion of the goodness of fit of theory and observation. On p. 255 of his 
memoir he gives a table “showing the observed and expected distributions of 
winter egg production for all matings taken together.” He remarks on this 
table that “the lumped figures do not give an altogether fair estimate of the 
matter, but some sort of a summary is necessary.” We agree very cordially 
because a number of the impossible green balls of the subsidiary tables do not 
appear as such when the tables are lumped, but taking the data for what they 


are worth we have the following four series: 
Winter Egg Production 





Over 30 Under 30 Zero 
I. Observation ... er 365°5 259°5 31 : 
Theory en es 381°45 257°25 17°30 
II. Observation ... =; 2 23 15 
Theory aoe Th 0 25 15 
III. Observation ... am 36 79 8 
Theory ae ii 26°5 86°75 9°75 
IV. Observation ... x 57°5 98°5 23 
Theory eae = 68°60 95:0 15-40 


* The Kallikak Family, p, 114. 
Biometrika 1x 
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Pearson’s criterion gives the following results : 


I. P=:7003 
Il. P=-000 
Ill. P=°118 
IV. P=:036 


The odds against the first series arising from material following the theory are 
332 to 1; the second series is impossible on the theory; the odds against the third 
series are about 8 to 1; and against the fourth series are about 27 to 1. The com- 
bined odds against even the three series (I, III and IV) representing the theory 
are very large indeed. Dr Pear! actually tells us that “the investigator is usually 
expected to reject abnormal material” (p. 256). And he prides himself on not having 
done so*, and asks us to form a judgment not on the summary but on the detailed 
data in the body of the paper. We have done so, and the criterion gives still worse 
results. We agree with Dr Pearl that “the high producing hen, somewhat like 
the race horse, is a rather finely strung, delicate mechanism, which can be easily 
upset, and prevented from giving full normal expression to its inherited capacity 
in respect to fecundity” (loc. cit. p. 255). But surely this is only to admit that 
the character chosen was wholly unfit to test the theory upon at all? It does not 
justify rejecting the only scientific test of “goodness of fit,” and then concluding 
from nothing other than general impression that “the cumulative probability 
that the hypothesis applied represents at least a reasonable approximation to 
the true interpretation of the results becomes very great” (loc. cit. p. 257). If 
“cumulative probability” signifies anything at all, it means the theory of 
probability applied to the series to deduce combined odds against the total 
results and these are hopelessly against Dr Pearl. Further we cannot go until 
Dr Pearl publishes his record, which is not yet before us, although he has 
published bis own interpretation of it in a great variety of journals. 


APPENDIX III. 
ON THE EQUATION TO THE SURFACE OF CoNsTANT Q. 


If the limits to the frequency range in the variate « be a and a’, and in the 
variate y be b and 0’, then in the notation of p. 184 


a fb 
‘ge -{ edady, 
Jy 


x 


a ob a 6 
p -| | zdady, q =| | zdady, 
avy aly 


a PP ryq 
dady 
* «But in view of the rather hysterical attacks upon geneticists and their method of work in this 
country, if for no other reasons, it seems best to follow the plan of publishing all the data.” We 
would remind Dr Pearl that this is exactly what he has not done. We require the quantitative record 
of every individual hen and its ancestry as far as is known before we can fully test the validity 
of his results. 
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By the equation in the second line of p. 184, we have 








n, a l~X) (Pt Dt NtV{N -(1- x) (pt Q)}? + 4x (L— x) 9 
lire ° A gists ae ee eee 
4 (1 — x) 

Whence by straightforward but somewhat laborious differentiation we find 

dp dq 

a dx dy 

(V-G-x) (p+ QP +x —x)pq}* 

This is the equation to the surface of constant Q. As soon as we know the 

nature of the marginal frequencies, i.e. the values of p and q, we can find the form 

of the surface. The above equation is somewhat simplified if we refer p and q 

to the medians of their frequency-distributions, i.e. write p=}N—a, g=4N-8. 
In this case 


x {N?-N (1—yx)(p+q)+ 2(1-x) pq} 





da dB . 
S da dy XEN (1 + x) —2 (x 1) a8} 


(yN? — (y—1) [(a+ 8)?— x (a- By}? 


If the marginal frequencies are Gaussian, 





1 1 y? 
N eh Se) N y eer ocr 
= —— i e 2% dz, B= — [ e 202? dy, 
NV 2aro,/ 0 V 2m, 0 
1 2? 1 y? 
da N -352 Bp N 95.2 
-_= —@ lid | 3 _ yas ée ore 
dx Varo, dy V7 C2 


It is therefore possible by aid of Sheppard’s Tables to construct the contour 
lines of the surface of constant Q for this relatively simple case. But the surface 
is far from simple and its complex equation seems to indicate that association as 
measured by Q is of a very arbitrary character. We have constructed the surface 
of constant association for the special case of Gaussian marginal frequencies, when 
Q='6. The photograph of the surface, the regression lines and the contours will 
be published on another occasion. It suffices here to note: (i) that the arrays are 
heteroscedastic, varying from homoscedasticity of the mid-section to a skewness of 
‘16 when 2/o, = 1°5 and to a skewness of ‘20 when w/o, =3°5. (ii) The regression 
line is most markedly skew, in shape like a Galton ogive, so that there is a 
maximum of regression, and therefore correlation, at the centre of the surface, 
while the regression and therefore correlation reduce to zero as we move outwards. 
No frequency surfaces in actual practice exhibit, as far as we are aware, these 
features demanded by constant Yulean association. 


40—2 

















MISCELLANEA. 


I. The Correction to be made to the Correlation Ratio for Grouping*. 
By STUDENT, 


Using the ordinary notation viz. x,,=the number in the w array of y’s whose mean is at xp, 
Yxy= the mean of this array, V the total number in the sample, and 7 the general mean of y, 
we have 7? defined by the relation 
S {tzy Yuy— 9)"} 

— y= = letawcunededusccderetsserocss Gsdeceusveue (i). 


2 
7 No, 


If »? is required to fit a regression curve to the actual observations as in Professor Pearson’s 
original memoir “On the General Theory of Skew Correlation and Non-linear Regression,” no 
correction is necessary. 

But if we require a ratio which shall remain constant under wide variations of grouping 
and of number in the sample and which shall consequently be more comparable from one sample 
to another, there are two corrections to be made. 

The first of these has already been given by Professor Pearson (Biometrika, Vol. vim, p. 256), 
and he has expressed it as follows :—If 7? be the value of y? actually found by the use of (i), 
and 7? be the value which would be found from an infinitely large sample, then if « be the 
number of x arrays 





But there is a further effect of grouping which has not hitherto been noted and which can be 
evaluated as follows : 

Suppose the wz, array to be divided into elementary « arrays and let y, be the mean 
the x, elementary array and 2, its frequency. 


of 


Then clearly the proper contribution of the #, array to 7? is 


S {Ny (Yp ~ yy} 
No, A 
This is equal to 
S {Ny (Yrg—-Y+Yp — Yq)*} ah os Sale bf Miers sar £ ; ae 
soe “ a a om [S {n, (Yx.— 93 +28 {n, (Yzp— ¥) (Yr— Yx,)} +8 {n, (Yp— Yay)" ]- 
No,? No,? 
Now Y,,—¥ is of course constant for this summation, S (m)=nz, and S {np (Yp — Yx_)} =9, 
herefore the contribution to 7? 
e "xp (Yuy— Y)” + S Np (Yo- Yup)s 


No? No? 


* See above p. 118 of this Journal. 
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The first of these two terms is that which is obtained in the ordinary way, so the contri- 
bution of each array should be corrected by the addition of the second term and 7? itself by 
the addition of 

° faa oe | 
No,? 

Now if Professor Pearson’s correction (ii) has been made we may take the point whose 

coordinates are (7», Yp) to lie on the regression line, and if further we assume the regression 


line to be linear throughout the x, group and to be inclined at an angle of tan~!7p °Y to the 
Cx 


horizontal we have 


Yp=2Xy. Tp and Yun =Up + Mp 4 
JP Pp ? oO, Xp Pp Po. 
Hence (iv) becomes 
29 §, . 7,2 
g| S {iy (Xp — Xp)} (v) 
No? SREP eRe EEK EEE HEHE EEE OHHH EEE EEE He 


Now S {n, (%»—%p)"} is the second moment of the w, group about its own mean and when the 
distribution is known can often be approximately evaluated. Similarly when the distribution is 
known 7, can be estimated and the correction to 7? calculated group by group. 

p ? 5 5 } 
But by making certain assumptions we can very much simplify the work, and a practical 
y gs P 
test, in which the assumptions are not justified, will show the sort of errors which are 
introduced. 

The first assumptions are that the regression is linear and the arrays homoscedastic. In 
this case of course 7, is constant and equal to n; we are practically determining a value of 7 by 
the » method. 


The correction then becomes 
1” ore 72 
SS {my (Xp — %p)*}], 


No? 
or writing A2=S[S {m, (w, —Zp)*}] and H? for the raw value of n? after using Pearson’s correction, 
we get from (iii) n?= H?+ 7nd? or 


To obtain a value for \? we still require to postulate something of the nature of the distri- 
bution and I propose to treat (i) of the case where the unit of grouping is constant and small 
enough for the frequency in each group to be considered to be distributed as a trapezium, 
and (ii) of the case where the frequency distribution is normal. 


(i) First to find the second moment of a trapezium about its mean. 


Let z, and z, be the ordinates forming the ‘walls’ of the trapezium and let the group 


unit be A. 


Then y=z+ (**) x is the equation to the ‘roof’ referred to the ‘floor’ and left hand 
: 0 
: ZytZy) h 
‘wall’ as axes. The area is clearly ( ott) 5 
The mean is at 


0 ye sou (Zp-+-2e') \ 3h . 


oe 2 is ee dew +% 
raaass 


3° Zgtey | 
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The second moment coefficient about the axis of y is 


2 Wise 2 ((a@y—%)h* | 2,h3) ih? Bzy +2, 
I (@4+2y) I, ai ae OSE ah x} a 


The second moment coefficient about the mean is 








H tev Goeth W fetits’l I fy (une) 
6° ate 9° (a+2y) 18 (25+ 29)” 12 3 \e.4z) J ° 








Clearly when is reasonably small (224) is a quantity of the second order and in 
this case Re 


so that 


" 1 


” —B : =| Sa i tg as eased (viii), 
( “ip > 


when the unit of grouping is uniform and small. 





_ 
| 


(ii) When the unit of grouping is neither uniform nor small and there is no special know- 
ledge of the nature of the distribution, we must needs fall back on the Gaussian curve to give us 
a first approximation to z, and z, for each group. 

In this case 

Soa oy 2 ° 
1-.= NS (eat ME ENE. Re (ix)*, 
Nz 
‘p 
and it is necessary to determine it, after fitting the frequency by means of Sheppard’s tables. 
Finally, what correction, if any, is to be made for the grouping of v? 
This will become more apparent from the alternative formula for y?, namely 


121 -SY-HY 
7 No,? 





For the second moment of each array should be corrected by the subtraction of ~~ where & 


is the unit of grouping of y so that 


cous Re 
: S(y- 4%, - 2 


y=l- 





TT 


S (Yy-yP- 7 ) 


_S(y-yP -S(y-%,P 
= ‘ ol a 
S(y-y)?- 12 





_SY-Hty—-H"-S(y-7)" 

we No, a 

_S(Y- 4s? +28 (y-I-) (Yn -Y) +8 (He— YP -—S(Y- J)” 
‘i No,? 

_S {ts (Ys— Y)*} 

7 No,? 





since S(¥,—¥y)*? when summed for each individual becomes S {n,(¥,—Y)?} when summed for 
each array, and S (y—¥,) (y,— ¥) vanishes for each array. 

Hence there is no correction to be made for the y grouping except Sheppard’s correction for 
the Standard Deviation of 7. 


* The suggestion of this formula I owe to Professor Pearson. 
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I have tested the results on an instance given in Professor Pearson’s original memoir, namely 
the age and auricular height in Girls, correlation table pp. 34 and 54. 


The means of the arrays in the full table are as follows : 


Even Grouping 























Wesehen of | Uneven Grouping 
Grouping A | Mean Auricular| Number of | Number 
SER CES 9 ae aes ge | Height Cases 
|| m| 1 | | vo v | vj vn 
| | Ee | | | 
§ — s— 4 115°25 1 
rt eee 116-9643 a4 | | | 
ax 5— 6 117-4722 is | 
as > 7 11971000 | 40 | IPI 
l /—| 7—8 | 120°3026 | 76 | \ 
— 8— 9 | 121°6340 125 ) 
| er a 9-10 | 121-7246 | = 177 { 
| -)| —1 gor | Tee-aieo «=| «S85 —|) | 
| §|—| m—22 | 129314297 | 261 —1¢ 1 4 
J} ty 12—13 | 123°8908 309 — fe ae on 
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These were grouped in seven ways in three of which the groups were of equal width, and the 
other four give an attempt at equal frequency: the method of grouping is set out by means 
of columns headed in Roman numerals. The age distribution differs significantly from the 
normal, the constants being 8;=°0013, 8,=2°7101, but it would perhaps have been better to 
have selected a less normal distribution : still it represents the ordinary ‘cocked hat’ statistics 
that tend to occur. 


The regression is certainly not very linear, the growth apparently ceasing at about 18-19. 


The values of 7? (the raw value), H? (the value after using Professor Pearson’s correction) and 
7? (the value after attempting to use the d? correction) are given in the following table. 





mu h 1—)* from 
Number | Number | (7= 129 2) Normal Curve 
of } of ia ” H2 x : AS: ; 
Grouping | Groups | | d, i 
| n ” 7: ” 
| | — - aa | _ — — 
I | 20 ‘09183 | 303 08414-98489 | -291 | “08494 *291 
II 10 "08657 | °294 | °08290 | -08595 | °293 | 08510 292 
II] 5 | -07701 | -278 | 07535 | -08786 | -296 08635 294 | 
| IV 9 | °08836 | °297 | ‘08510 -— — “08953 “299 | 
Vv | 6 08342 | *289 | 08136 — ~ 08913 "299 
VI 5 | 08218 | -287 | 08053 -- "08885 “298 


Vil 2 | 06203  °249 | 06159 — |— "09739 | “312 
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It will be seen that the first three, with even grouping, are very close together though 
the number of groups has been reduced from 20 to 5. Similarly the next three are close 
together, and the last is again by itself. 


An examination of the way in which the groups are taken shows that the more the tail 
is bunched together the higher is the value found, and this is what would be expected in this 
particular case, since there is practically no increase of head height with age at the ‘old’ end of 
the scale, whereas for purposes of calculation we have assumed a constant angle for the regression 
line. But it may be pointed out that y varies (to the 2nd place of decimals) only from ‘29 to ‘31 
even if we reduce the twenty groups to two, an extreme proceeding which is never done in 
practice. 

At the same time the ordinary six or eight groups may be expected to give results a little 
too high when, as is usual, the regression line is curved. 


II. On the Hereditary Character of General Health. 
By KARL PEARSON, F.R.S. and ETHEL M. ELDERTON, Galton Laboratory. 


(1) In dealing with the heredity of general health we have to meet at once certain funda- 
mental difficulties. We have first the question of environment and secondly the question of 
variety in health caused by what we may term accident. If we deal with families living in 
widely differentiated environments we shall have, or certainly may have, a spurious correlation 
of health in parents and offspring; the resemblance in health will be emphasised. On the 
other hand, when a single member of a family is exposed to a specially differentiated environ- 
ment, i.e. goes to the West Coast of Africa, or spends his life in India, or catches enteric at a 
particularly unfavourable moment, the correlation of general health may be decidedly weakened 
in the case of parent and offspring. These difficulties of differentiated environment and what 
we may, perhaps, term accident cannot be wholly overcome, but we may endeavour to meet 
or measure them. In the first place we can confine our observations to one social class and 
thus go a long way to get differentiated environment removed. If, as in the present paper, we 
deal essentially with the professional classes, there is great uniformity of general environment. 
The food supply is sufficient, the doctor is always at command, physical exercise is fairly general 
and markedly insanitary houses or occupations are practically avoided. We do not think 
therefore that, for the data of the present paper, differential environment is a marked factor 
in producing correlation. On the other hand we do consider it possible that “accident” will 
weaken the relationships sought. The reduction in health-correlations below the values for 
other physical characters, might indeed be taken as a measure of random action on health, 
comparable with the random action of death itself in reducing the correlation of duration 
of life, which has already been discussed by one of us*. Indeed heredity of general health 
is almost as significant for the problem of natural selection, as heredity of duration of life. 

A more serious difficulty in the health-inheritance problem is this very question of death. 
If parents are delicate and health or delicacy is hereditary, they will have delicate children, 
and we may anticipate that more of these children will die than in the case of tho children 
of robust or normally healthy parents. Thus only the healthier children of delicate parents 
will survive for us to record their state of health, and accordingly the offspring of delicate 
parents will appear healthier than they really should be owing to the selective death-rate. In 
our investigations we have dealt only with offspring who lived to be adult, ie. to at least 
21 years of age, so that an appreciation of general health could be formed. There is also a 
further difficulty that very delicate parents are themselves likely to die, and we have again 


* R. S. Proc. Vol. 65, p. 290, and Biometrika, Vol. 1. p. 50. 
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less chance of getting their record. Hence the problem is by no means without difficulty, but it 
is of such great interest that we venture to give here our chief results. 

Our health classes were: Very Robust, Robust, Normally Healthy, Rather Delicate, Delicate 
and Very Delicate. These have beén verbally defined, and our present definitions are as 
follows : 


SCALE. 
V.R. Very robust.—He has never had to see a doctor, nor been off work through illness. 
R. Robust.—-He has only seen a doctor about minor ailments, and has only been off 


work for colds, etc. 

N.H. Norraally healthy.—He has not had more than one serious illness, involving, say, 
a fortnight’s absence from work during the last ten years. 

R.D. Rather delicate.—He has had more than one serious illness, but not more than one 
involving more than four weeks’ absence from work during the last ten years. 

D. Delicate.—He is off work through illness at least four weeks in all every year. 

V.D. Very delicate.—He is in a chronic state of ill-health, 


But these definitions were not used throughout the whole of the records included in the present 
investigation, and we found considerable reluctance to the use of the “ Very Delicate” category. 
Accordingly in the present reduction all the delicate categories have been clubbed together. 
As in previous investigations a Gaussian frequency scale was used for Health*, the interval 
on the scale covered by “ Normally Healthy” being taken to represent 100 units of health, 
which may be called sanitaces. These hundred sanitaces were supposed to represent the range 
of normal health of each type of individuals—fathers, mothers, sons and daughters, and the rest 
of the distribution calculated in terms of them. At the mean of the parental category was 
then plotted up the mean number of sanitaces of all the offspring of either sex of parents of 
the given category. Thus the round black dots of our four first diagrams were obtained. It 
was found that the three points marking the mean health of offspring of “Very Robust,” 
“ Robust” and “ Normally Healthy” parents were in all four cases closely on a line which passed 
through the mean of parental and filial health, but that the mean health of children of delicate 
parents in all four cases gave a point lying markedly above this line. This line in all four cases 
shows a marked slope indicating that as the health of the parents is worse so the health of their 
offspring is worse. 

To illustrate the probable source, or at least part source, of the anomaly in the case of the 
delicate parents, the percentage of sons dying before 21 years of age and so escaping record 
of health was obtained for each health group of the fathers and this percentage was plotted in 
Diagram V. to a Gaussian scale of Father’s health. Again we note a fairly uniform increase of 
the percentage of deaths with decreasing goodness of health of the father, for the three better 
categories, but a very marked excess of deaths—quite off the line-—in the category of delicate . 
fathers, where some 13 °/, of sons die as against the 7 °/, in the case of “ Very Robust” fathers. 

(2) Still preserving our Gaussian scales, we may consider that it is the sons of delicate 
fathers who have been especially selected. In this case the health of the sons of the three 
left-hand categories of fathers will give very nearly the true regression line for unselected’ 
material, We have made no attempt to algebraically fit the best lines to these three points, 
but placed them graphically on the points and through the general meanst. 


* Cf. Huxley Lecture, Biometrika, Vol. ur. p. 146. 

+ This was done because the three left-hand points always fall nearly on the same line with the 
general means, but of course the means of both parents and offspring would be somewhat lowered had 
we the full complement of delicate sons of delicate parents :—to increase the number of such sons 
would ipso facto increase the number of such parents and thus both means of health in offspring 
and parents would be lowered in the same direction probably approximately that of the true regression 
line. 

Biometrika 1x 41 
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But there is another disturbing factor, which the diagrams make at once obvious, the mean 
health of parents is in very considerable degree better than that of the offspring. Measuring 
our mean health of any population in sanitaces from the division between “Robust” and 
“Normally Healthy,” negative towards normally healthy, we have the following results : 


TABLE I. 
Mean Healths of Various Populations. 
A 

Fathers of Sons: + 2°72 sa. 48] 
Fathers of Daughters: — 2°09 sa. * 4 
Mothers of Sons: — 20°22 ~ _ 9-49 
Mothers of Daughters: —17°73 sa. at 
Sons of Fathers : —17°11 sa.) _ 0-29 
Sons of Mothers : — 16°82 sa. ais 


Daughters of Fathers: — 35°53 sa. 


— 0°89. 
Daughters of Mothers: -—34°64 at wows 


These results show at once that the health of the parents is far better than that of their 
offspring. This does not imply that the younger generation has degenerated but only that 
there is a selection of the more healthy for parentage ; the more robust men and women are, 
the more likely they are to be parents and repeated parents. The differences between sons of 
fathers and sons of mothers is due to a difference of material and the same applies to daughters 
of fathers and daughters of mothers ; these differences are probably those of random sampliug. 
The differences in the cases of parents are more marked and possibly significant. If they be, 
then we have some slight suggestion that a healthy father and a delicate mother would be more 
likely to have sons and a delicate father and a healthy mother daughters. Even if there be 
anything in the suggestion, it would only be shown in large numbers, and is not a universal rule 
for individual pairs. All we can say is that the numbers do not flatly contradict a popular 
impression of the kind. 


(3) Another result brought out by our numbers is that the health of the male is markedly 
better than that of the female in both generations ; this is possibly the effect of a more stringent 
selection of the male. Such a selection may be of two kinds, first the known heavier death-rate 
of the male, and secondly a greater objection to admission of delicacy or even to giving a record 
at all on the part of the delicate male. The general difference can be seen in the following 
percentages of delicate individuals : 


TABLE II. 
Percentages of Delicate Individuals in Various Populations. 
Fathers of Sons: i i. tag Oe 
Fathers of Daughters: 9°68 °/,. 
Sons of Fathers: 13°69 °/.. 
Sons of Mothers : 13°57 °/,. 
Mothers of Sons : 16°65 °/ 


Mothers of Daughters: 16°56 |. 
Daughters of Fathers: 20°44 °/.. 
Daughters of Mothers: 20°82 °/.. 


There is an increase in the delicacy rate as we pass from Fathers of Sons to Fathers of 
Daughters ; a slight fall only in ths delicacy rate as we pass from Mothers of Sons to Mothers 
of Daughters. But both are supported by the rates of “Very Robust” where we find the 
following percentages: Fathers of Sons 18°76; Fathers of Daughters 17-91; Mothers of Sons 
11°28 ; Mothers of Daughters 13-02. 
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(4) In the following table the Standard Deviations of the various populations are given also 
in terms of sanitaces. 


TABLE III. 


Standard Deviations of Health in Terms of Sanitaces of 
Various Populations. 


Fathers of Sons: 72°25 sa. 
Fathers of Daughters: 75°31 sa. 
Mothers of Sons: 82°40 sa. 
Mothers of Daughters: 84°65 sa. : 
Sons of Fathers : 75°73 sa, 
Sens of Mothers : 75°64 sa. 
Daughters of Fathers: 78°06 sa. 


Daughters of Mothers: 80:42 sa. 


The definite conclusions we can draw from this table are: first, that women are definitely 
more variable in health than men, and secondly that daughters seem less variable in health 
than their mothers, while sons are possibly but only slightly more variable than their fathers. 
On the whole the nature of the health selection between the older and younger generations 
seems to be of a character which leaves the variability only slightly modified, but shifts very 
considerably the mean health as a whole from parents to children. 
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(5) The results so far reached enable us to test the degree of stability of our scale. If that 
scale is a reasonable one, the range of the “ Robust” category ought to come out with reasonable 
sameness when measured in sanitaces for the different populations. We find : 


TABLE IV. 

Range of Robust for Various Populations. 
Fathers of Sons: 66°80 sa. 
Fathers of Daughters: 67°02 sa. 

Sons of Fathers: 74°80 sa. 
Sons of Mothers : 76°95 sa. 
Mothers of Sons : 79°61 sa. 


Mothers of Daughters: 77°52 sa. 
Daughters of Fathers: 81°54 sa. 
Daughters of Mothers: 82°47 sa. 


It will, we think, be clear from these results that the terms employed in our categories have 
been used in rather different senses when applied to males and females, and when applied to 
the younger and older generation. If we assume “ Normal Health” to be the same for all types, 
then the category “ Robust” has been used in a wider sense for women than for men, and in a 
wider sense for the younger than the older generation. Nor does this seem unreasonable when 
we compare ordinary practice, in which undoubtedly a different health scale is applied to men 
and women, and to old and young. As we have seen there is on an average a lower state of 
health in women than in men and we are apt to judge by deviation from the average rather 
than by absolute condition, Again the average health of the older generation is higher than 
that of the younger, and one is rather apt to compare the health of the offspring with that 
of the parent instead of applying an absolute standard to both. Anyhow without laying much 
stress on the reasons for the personal equation, it is probable that judgment does differ in the 
matter of health according as we are dealing with man or woman, and with the old or young 
generation. 


Speaking in quite round numbers we may say that the range of our “ Robust” in women, 
regardless of whether they are parents or not, is about 80 sanitaces and this is almost equal to 
the standard deviation of their health (mean of four populations 81°4 sa.). In the case of men 
of every status, the range (71°39 sa.) of the “ Robust” is slightly less than the standard deviation 
(74'7 sa.), but for many purposes it may be sufficient to consider both as 75 sanitaces. We 
have no means of ascertaining the absolute health in sanitaces of any individual, but if we were 
to assume 300 sanitaces as the stock of an individual on the border of “ Normal Health ” and 
“ Robustness,” the average man would have about 292 and the average woman about 273 units of 
health ; one woman in a thousand would have less than 29 units of health, and one man less 
than 67. The “Very Robust” man would be a man with more than 371 units and the “Very 
Robust” woman would be a woman with more than 380 units. Finally one man and one 
woman also in a thousand would have more than 517 units of health. Thus while the most 
robust men and women in the thousand are of the same calibre, the most delicate woman 
has less health than the most delicate man,—a result possibly of the more stringent death- 
rate ; a man needs more health to survive at all. Of course these results are purely suggestive, 
but they flow with some probability from the lower average health of women and their greater 
variability. We should not desire to place any great weight on them. Other data for different 
age groups and social classes will be discussed later, and then it will be more possible to propose 
with greater certitude a definite health scale. 


(6) Looked at from the average environment of the professional classes, there can be little 
doubt that our diagrams indicate that general health is a hereditary character ; but we have of 
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course to regard the difficulty of the record in regard to delicate offspring of delicate parents: 
We want if possible to get some measure of the correlation in health of parent and offspring 
from our first three points in each diagram. 

Clearly we can find the slopes of the regression line passing through these three points. If 
oo be the standard deviation of offspring and op of parents, the slope 


v0 
&8= —XTop.- 
op 


This of course is independent of normality. Again the weighted mean square deviation of 
arrays has the value og?(1—7op*). Let the S. D.’s of Very Robust, Robust, and Normally 
Healthy be o4,, Ga,, and oq, then 
: - Ny 7g, + NQO7q. + N3 Tag 
oo" (1—-7op*) =- Ny +N2+N3 Bs 





9 9 9 
: 1- ror” my Oa + NO ayt N30" ag 


ii a 2 2 
Yop" (My +Ng+Nz3) op* xX 8” 


Now 6; Gag) Ta, CaN for each array be expressed in terms of h the 100 sanitaces of the 


“ Normally Healthy ” range for that array and s can be carefully measured on the diagrams. It 
only remains to consider what value shall be given to op. Undoubtedly some parents are 
omitted because they have died from delicacy, but on the whole we are convinced that there has 
been rather less selection of parents than of offspring. Accordingly we have put op its value 
in terms of the range of “ Normally Healthy.” Thus 7op can be calculated, without regarding 
the final anomalous array. 

But clearly we have to correct the result for our grouping in arrays of parents, but for 
parents only, as the 8. D.’s of the arrays have been found from total frequencies of the groups, 
on the assumption that each array is normal. We must therefore divide each correlation by 
the correlation between class-index and individual character—a point discussed in another 
paper (see pp. 116 and 134 above). 


These corrective factors, *,¢ of our notation, are : 
- x 


eC, for Fathers in Fathers and Sons: 9258. 
Pa os Fathers and Daughters: 9333. 
Mothers in Mothers and Sons : "9354. 


Mothers and Daughters: 9384. 


” ” 


Thus we have 


Correlations of Health, Parent and Offspring. 


Raw * Corrected 
Father and Son : 4456 pete 39 
Father and Daughter: 2852 ‘3056f ~~ 
Mother and Son: "3551 3796) ... : 
Mother and Daughter: +3407 ‘3631, °° 


Mean = "3824. 


We have also considered the correlations from another standpoint. We have for the slope s 
of the regression line for the three first points 
v0 
S= —Top. 
op 


* Corrected of course for defect of delicate offspring of delicate parents, i.e. found from formula 


for rop above. 











328 Miscellanea 


Therefore : 
Top= TP x & 
co 

Now op may be slightly too small because some delicate parents will be omitted and oo will 
probably be too small because, as we have seen, many delicate offspring escape record. We 
shall thus get rather too large values of rgp. There will be no correction to be made this time 
as the values of op and go are based on frequencies occurring between certain limits. The 
following values were obtained : 

Slopes Correlation 


Fathers and Sons : 55 5247) . 42 
Fathers and Daughters: 33 3184 , 
Mothers and Sons: "B45 3758 37 
Mothers and Daughters: °34 “3578) °' 


Mean = ‘3942. 


This confirms the values previously obtained and we think we may adopt them and especially 
the mean* value ‘3824 first reached as at least a fair approximation. Now suppose the 
chance of an individual’s health being due to some other cause than heredity to be p, then in 
a population of 1 pairs the chance of both pairs having their health as a natural inheritance 
will be (i—p)? and the ratio of correlated to the total material will be (1—p)? W/J, this 
will measure the reduction in the correlation of health between parent and offspring due to 
accidental and extraneous causes, 


Now the full strength of heredity for physical characters has been shown in the professional 
classes to be on an average about ‘46: see Biometrika, Vol. 11. p. 357. Hence we have : 


(1—p)?=*3824/°46, 
or: p= 0743. 


Thus the health of only 7°4°/, of the population is determined by accidental causes. In 
92°6 °/,, ie. in the great bulk of cases, heredity is the chief factor in the determination of general 
health. Without laying too much stress on the actual figures, we think it may safely be con- 
cluded that heredity in the professional classes is the chief source of the good or bad health of 
individuals. 


Tables. General Health. 
TABLE I. 


Father’s Health. 








Tor T nr r 
a Robust | Robust | Fyetithy’ | Delicate | Totals | 
S| Very Robust... ... | 111-25 | 6375 | 19 | 15% 209°5 
tt | Robust ae | 117-95 | 9875 | 161°5 39°25 | 555-5 
an Normally Healthy ... 88 | 236 | 461°5 57°5 843 
@ | Delicate ...° 1. | 33 72°75 | 117 32-25 | 255 
a | 











| Totals .. ... | 3495 | 610 | 759 | 144°5 1863 


* There is something anomalous in the high value of the father and son correlation, but we have 
not been able to trace it to any definite origin. 




















Son’s Health. 


Daughter’s Health. 
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TABLE II. 


Mother’s Health. 
























































Tary y } 
BB Robust | Healthy. Delicate | Totals 
Very Robust . 76 CO| OATS 42 ti|sog7 192°5 
Robust 55:5 215°75 201°5 | 72°25 545 
Normally Healthy 61 189°75 4245 | 134°25 809°5 
Delicate ie 9°5 66°5 1025 | 64°5 243 
| 
Totals 202 519°5 7705 | 298 1790 | 
TABLE III. 
Father’s Health. 
: n eae : 
H 7 “i “ | 
on t Robust oe Delicate | Totals | 
| = 
| Very Robust . 53°5 33°25 25°5 16°25 128°5 
| Robust ; 1175 | 207 149 26°5 500 
| Normally Healthy 110 245°25 462°25 95 912°5 
| Delicate 66°5 114 165°75 49°75 396 
Totals 347°5 | 599°5 802°5 187°5 1937 
TABLE IV. 
Mother’s Health. 
} | 7 ea 
ae ‘ | Robust | Healthy. | Delicate | Totals 
| ‘ | | 
| Very Robust . 52 38°5 30°5 15°5 136°5 | 
Robust 76 191°25 16425 58 489°5 | 
Normally Healthy 85°5 213 | 431°75 130°75 861 . 
Delicate... 310 | 95°75 | | 187 | «106-75 391 
Totals 445 | 5385 | 784 | 311 1878 | 


Daughter’s Health 
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III. Note on the Honduras Piebald. 
By KARL PEARSON, F.R.S. 


The first report of this piebald with photographs was, I believe, brought back to Europe 
by M. le Comte Maurice de Périgny. Professeur R. Blanchard published an account based 
on the Comte de Périgny’s data in the Bulletin de la Société frangaise d’ Histoire de la Médecine, 
t. IX. p. 213, Paris, 1910. Through the courtesy of the Comte de Périgny I have been provided 
with copies of his photographs, which will appear in the Second Part of the Monograph on 
Albinism soon to be issued by E. Nettleship, C. H. Usher and myself, and are reproduced ag 
Plates X. and XI. here. This piebald boy is of much interest because he belongs essentially to 
the “classical type,” illustrated in paintings of the 18th century, and no living piebald of this 
type had so far come to our knowledge, although we had in the First Part of our monograph 
given many illustrations of early cases. 


But there is further scientific interest in this piebald because he supports the point of view 
emphasised by me in the monograph referred to, that when a pure race is crossed with a mixed 
race—a hybrid between races with markedly different degrees of pigmentation—then piebalds 
are likely to appear de novo, On this ground it seems to me idle to speak of piebaldism as 
a Mendelian unit character, and it is idle equally to talk of it as a latent unit character, for there 
is no evidence at all that it ever occurred before in the pure races whose crossing leads to 
these piebalds. In the present case the mother kas Mexican and negro blood, the Mexican 
being already a mixture of Spanish and American Indian. She thus combines black, red and 
white races. The father is a coal-black pure African. But several of our piebald pedigrees show 
that it is sufficient for piebaldism when a pure red or white is crossed with a pure dark race 
and then the hybrid be mated again with either pure race, the produce of this second cross may 
be a piebald. The light race may be merely an albino variety of the dark race, and in several 
of our dark race pedigrees we find such piebalds occurring in stocks wherein albinism has 
occurred also. Indeed given an albino occurring in a dark race it seems possible from the 
hybrid between normal and albino by crossing again with normal or albino to produce almost 
every shade of colour as well as every variety of piebaldism. If such cases occur in man and 
axolotl, it seems unnecessary to seek in the ancestry of the albino for possibilities of either 
piebaldism or colour, as for example has been done in the case of mice. There is no reason 
to suppose brilliant colours latent in the normal axolotl, nor piebaldism latent in the normal 
man. At any rate where there is no evidence of it before the crosses take place, it is more 
reasonable to suppose it a product of the crosses themselves. 


Without entering fully into a matter which will shortly be discussed at length elsewhere, 
I would point out that when a hybrid is formed between a black dog which has bred true for 
many generations and an albino of another race, this hybrid is either black or black with white 
markings on chest, never in our experience so far a true piebald, but when this hybrid is crossed 
again with the albino, we obtain at once not only black dogs, or black dogs with white markings, 
but black and white piebalds, lilac and white piebalds, rusty black dogs, red dogs, and albinos; 
possibly as the work goes on other types will appear also. Much the same changes seem 
to arise from like crosses not only in mice and axolotl but also in man. 


In the case of our present piebald we have the father a pure coal-black negro and his 
brothers and sisters are like him, the mother is a mixture of negro and Spanish-Indian blood ; 
she has a fair skin, black eyes and long black hair, and brothers and sisters are fair skinned like 
her. There are six children; the three eldest are boys aged 11 years, 9 years, and 7} years 
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Lisbey, the Piebald Boy of El Cayo, British Honduras. From a photograph taken for 
K. Pearson, through the courtesy of Robert H. Franklin, Esq., July, 1912. Plates 
VIII and IX compared with X and XI indicate that the frequency and relative 

size of the colour patches appear to have altered. 
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Lisbey, the Piebald Boy of El Cayo, British Honduras. From a photograph most 
kindly provided by M. le Comte Maurice de Périgny. Lisbey is shown with his 


Father. Front view, 1908. 


Reproduced from Pearson, Nettleship and Usher's 


Monograph on Albinism, Part I. 
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Lisbey, the Piebald Boy of El Cayo, British Honduras. 
kindly provided by M. le Comte Maurice de Périgny. 
Father. Back view, 1908. Reproduced from Pearson, Nettleship and Usher’s 
Monograph on Albinism, Part II. 


Plate XI! 


From a photograph most 
Lisbey is shown with his 
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respectively, the last being the piebald ; the three youngest are girls aged 4 years, 2 years, and 
9 months respectively. These last three children together with the piebald are given in Plate 
VIII. The grandparents were all “normal,” i.e. not piebald. Plate VIII. suffices to show 
not only the difference in colour between father and mother, but the range of colour in the three 
daughters. 


As soon as I had seen M. Blanchard’s account of this piebald boy I wrote to the District 
Commissioner, at El Cayo, Mr R. H. Franklin, and he most kindly sent me the fuller 
particulars here given, as well as arranged to have the family photographed for me. He tells 
me that the youngster is the pet of the place as well as its “curiosity” and that he is quite 
an intelligent boy. He was born on December Ist, 1904, at Peten in Guatemala and the reason 
his parents give for his piebaldism is “that owing to an eclipse of the moon on the night of his 
birth, he caught it in the head and it scattered over his body.” There was no eclipse of the 
moon on that date nor any near it. 


One point further may be emphasised with regard to this boy. In our Monograph on 
Albinism, p. 248, Plate I (23)—(26), we deal with the case of a piebald boy from Papua and 
we give photographs taken at nine years interval. The dark patches have grown larger, but 
they have not increased in number nor in relative size. In the present case a careful com- 
parison of the dark patches on the legs and arms of this Honduras piebald boy in the recent 
photographs (Plates VIII. and IX.) of July 1912 and those of the Comte de Périgny of some four 
years earlier (Plates X. and XI.), shows with almost absolute certainty that the relative area of 
dark pigmentation has considerably increased. This remarkable fact renders the boy of special 


interest, and it is to be hoped that in still later years photographs of him may be obtainable for 
comparison with the present series. 


IV. Selection and Intermediates in Bacillus coli. 
By LEONARD KEENE HIRSHBERG, M.A., M.D. 


In the course of some other work on a strain of Bacillus coli, taken from the rectum of a 
Scotch collie, and planted first in beef tea on October 5th, 1911, transplantations and agar plates 
were made of these organisms with the original purpose of stndying what has been hitherto 
called involution forms. The course of this work directed my attention to the possibility of 
these forms being dependent upon the quantity and quality of the pabulum or nutrient material 
furnished to the bacteria, and hence these so-called involution forms being actually types of 
polymorphism. 


Incidentally observations were made, in view of the claims made by the biologists represented 
by Professor Castle of Harvard on the one side, and Professor H. S. Jennings on the other, along 
the lines of possibly selecting races of long, short, and various intermediate generations of this 
colon bacillus. If it had proved possible to select a type of long, intermediate, or short bacilli, 
that had remained within the limits of the select mode without reverting back or possessing the 
power to generate all the types, it would have strengthened the work of Professor Castle on such 
higher animals as rats, with their manifold and necessarily complexly interrelated factors. As it 
is, however, after making two hundred and twenty transplantations of colonies of this strain of 
organisms, and trying to generate true long, short, thin, narrow, and intermediate types for both 
length and thickness, I find at the conclusion of that part of the work that there is absolutely 
no ground in my experiments supporting selection as an element in generating a particular type 
of these bacteria. 


Placed in suitable nutrient media, at room temperature, these bacilli divide very rapidly by 
simple fission. From twenty minutes to half an hour is the average time of division, yet if we 
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allow an hour, it is seen that in twenty-four hours—the average time of the transplantations on 
tubes or plates of various media for selecting—one bacillus gives generations numbering seven- 
teen millions of separate individuals. 

When the nutrient material is favourable and incubator temperature is used, the types of the 
bacilli tend to be shorter and actually thinner—although relatively thicker in appearance— 
while the rate of fission is much increased. In solid media there is also a greater tendency to 
the small forms, while in peptone and other liquid and at the same time less favourable material 
the long tenuous and slowly dividing types are more common. 


When bizarre types like those formerly called “degenerated bacilli” such as flask-shaped, 
drum-stick, dumb-bell, Indian club, and tenpin-like variations made their appearance, it was 
always possible to transplant these and obtain polymorphous bacilli of all the previously 
known specimens. 


True enough, it was often difficult at first to start these unfavourably shaped types growing. 
They required close attention, frequent transplantations, and the best media such as milk, 
serum bouillon, and sugar agar or sugar gelatin, but they inevitably came around, and again 
generated every type that had been observed during the experiments. 


Hanging drop slides as well as basic aniline dyes were used in the work, and although there 
was no standard speed by which the motility factor could be studied, an incidental attempt was 
made to select slowly moving from rapidly motile organisms. This too was without success, and 
also depended evidently on food supply, temperature, and other environmental changes. The 
succeeding generations always produced both types and none generated true forms of motility. 


Although some attempt was made to call all types under two micromillimetres short, and all 
over six micromillimetres long, the intermediates were allowed a range of three to five micro- 
millimetres. All above five-tenths of a micromillimetre were thick, while thin ones were one or 
two-tenths. 


Summary and conclusion : From these experiments, which it must be emphasized are incidental 
to some other bacteriological studies, it seems that in the case at least of Bacillus coli, a condition 
of polyniorphism exists. 


Efforts at selection in two hundred and twenty-five transplantations of thousands of genera- 
tions each, resulted in absolute failure to obtain any true strain of form or motility. The 
organisms, while subject to great variations about the given mode of the variety according to its 
food and environment, always reverted to the previous mean in subsequent generations. 


All types were under the proper conditions possessed of the power of generating all other 


types, hence selection as a method of generating any of these or any new type brought no 
result. 





CORRIGENDA. 


In Vol. vi. pp. 262—6 in the ‘‘Study of Pygmy Crania” by Miss H. Dorothy Smith, a slip occurs 
which is several times repeated. The crania dealt with are described as of the Third Dynasty. They 
belong to the XXVI—XXX Dynasties. As to the cemetery from which they were taken: see Flinders 
Petrie, Gizeh and Rifeh, p. 29. 


Mr J. I. Craig wishes to state that he has discovered the difference between Professor Myers’ 
measurements of head-length and those used in his own paper on the ‘‘Anthropometry of Modern 
Egyptians” (Biometrika, Vol. vim. p. 78). Professor Myers measured as usual from the glabella; 
Mr Craig was told that the prisoners’ heads were measured in the usual way (see loc. cit. p. 67, § 4), 
and supposed this to be also from the glabella. But he now finds that the Egyptian criminals are 
measured from the nasion. This explains the large differences between the head-lengths of the criminals 
and soldiers commented on in the Editorial footnote on p. 78. 








